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Preface 


“In almost all textbooks, even the best, this 
principle is presented so that it is impossible to 
understand.” (K. Jacobi Lectures on Dynamics, 
1842-1843). I have not chosen to break with 
tradition. 


V.I. Arnold, Mathematical Methods of Classical 
Mechanics (1980), footnote on p. 246 


There has been a remarkable revival of interest in classical me- 
chanics in recent years. We now know that there is much more 
to classical mechanics than previously suspected. The behavior of 
classical systems is surprisingly rich; derivation of the equations of 
motion, the focus of traditional presentations of mechanics, is just 
the beginning. Classical systems display a complicated array of 
phenomena such as non-linear resonances, chaotic behavior, and 
transitions to chaos. 

Traditional treatments of mechanics concentrate most of their 
effort on the extremely small class of symbolically tractable dy- 
namical systems. We concentrate on developing general methods 
for studying the behavior of systems, whether or not they have 
a symbolic solution. Typical systems exhibit behavior that is 
qualitatively different from the solvable systems and surprisingly 
complicated. We focus on the phenomena of motion, and we make 
extensive use of computer simulation to explore this motion. 

Even when a system is not symbolically tractable the tools of 
modern dynamics allow one to extract a qualitative understand- 
ing. Rather than concentrating on symbolic descriptions, we con- 
centrate on geometric features of the set of possible trajectories. 
Such tools provide a basis for the systematic analysis of numerical 
or experimental data. 

Classical mechanics is deceptively simple. It is surprisingly easy 
to get the right answer with fallacious reasoning or without real 
understanding. Traditional mathematical notation contributes 
to this problem. Symbols have ambiguous meanings, which de- 
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pend on context, and often even change within a given context.! 
For example, a fundamental result of mechanics is the Lagrange 
equations. Using traditional notation the Lagrange equations are 
written 


doL ak 
dtdg Əğ 


The Lagrangian L must be interpreted as a function of the position 
and velocity components q and g’, so that the partial deriva- 
tives make sense, but then in order for the time derivative d/dt 
to make sense solution paths must have been inserted into the 
partial derivatives of the Lagrangian to make functions of time. 
The traditional use of ambiguous notation is convenient in simple 
situations, but in more complicated situations it can be a serious 
handicap to clear reasoning. In order that the reasoning be clear 
and unambiguous, we have adopted a more precise mathematical 
notation. Our notation is functional and follows that of modern 
mathematical presentations. 

Computation also enters into the presentation of the mathe- 
matical ideas underlying mechanics. We require that our mathe- 
matical notations be explicit and precise enough so that they can 


‘In his book on mathematical pedagogy [15], Hans Freudenthal argues that 
the reliance on ambiguous, unstated notational conventions in such expressions 
as f(x) and df(x)/dz makes mathematics, and especially introductory calcu- 
lus, extremely confusing for beginning students; and he enjoins mathematics 
educators to use more formal modern notation. 


?In his beautiful book Calculus on Manifolds (1965), Michael Spivak uses 
functional notation. On p.44 he discusses some of the problems with classical 
notation. We excerpt a particularly juicy quote: 


The mere statement of [the chain rule] in classical notation requires the 
introduction of irrelevant letters. The usual evaluation for Di(fo(g,h)) 
runs as follows: 


If f(u,v) is a function and u = g(x,y) and v = A(z, y) then 
OF(g(z,y) h(z,y)) _ Of(u,v) Ou | OF(u,v) ðv 

Ox ðu Oa ðv Ox 
[The symbol ðu/Əðx means 0/Ox g(x,y), and ð/ðu f(u,v) means 
Dif (u,v) = Dif (g(x,y), h(x, y)).] This equation is often written simply 
Of Of Ou, Of dv 
ðr OuOxr Ov Ox” 


Note that f means something different on the two sides of the equation! 
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be interpreted automatically, as by a computer. As a consequence 
of this requirement the formulas and equations that appear in the 
text stand on their own. They have clear meaning, independent of 
the informal context. For example, we write Lagrange’s equations 
in functional notation as follows:? 


D(O2L o T|q]) — iL ol |g] = 0 


The Lagrangian L is a real-valued function of time t, coordinates 
x, and velocities v; the value is L(t,2,v). Partial derivatives 
are indicated as derivatives of functions with respect to partic- 
ular argument positions; 02L indicates the function obtained by 
taking the partial derivative of the Lagrangian function L with 
respect to the velocity argument position. The traditional partial 
derivative notation, which employs a derivative with respect to a 
“variable,” depends on context and can lead to ambiguity.4 The 
partial derivatives of the Lagrangian are then explicitly evaluated 
along a path function q. The time derivative is taken and the 
Lagrange equations formed. Each step is explicit; there are no 
implicit substitutions. 

Computational algorithms are used to communicate precisely 
some of the methods used in the analysis of dynamical phenomena. 
Expressing the methods of variational mechanics in a computer 
language forces them to be unambiguous and computationally 
effective. Computation requires us to be precise about the repre- 
sentation of mechanical and geometric notions as computational 
objects and permits us to represent explicitly the algorithms for 
manipulating these objects. Also, once formalized as a procedure, 
a mathematical idea becomes a tool that can be used directly to 
compute results. 

Active exploration on the part of the student is an essential 
part of the learning experience. Our focus is on understanding 
the motion of systems; to learn about motion the student must 
actively explore the motion of systems through simulation and 


3This is presented here without explanation, to give the flavor of the notation. 
The text gives a full explanation. 


4 «Tt is necessary to use the apparatus of partial derivatives, in which even the 
notation is ambiguous.” From V.I. Arnold, Mathematical Methods of Classical 
Mechanics (1980), Section 47, p258. See also the footnote on that page. 
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experiment. The exercises and projects are an integral part of the 
presentation. 

That the mathematics is precise enough to be interpreted au- 
tomatically allows active exploration to be extended to the math- 
ematics. The requirement that the computer be able to inter- 
pret any expression provides strict and immediate feedback as 
to whether the expression is correctly formulated. Experience 
demonstrates that interaction with the computer in this way un- 
covers and corrects many deficiencies in understanding. 

This book presents classical mechanics from an unusual per- 
spective. It focuses on understanding motion rather than deriving 
equations of motion. It weaves recent discoveries of nonlinear dy- 
namics throughout the presentation, rather than presenting them 
as an afterthought. It uses functional mathematical notation that 
allows precise understanding of fundamental properties of classical 
mechanics. It uses computation to constrain notation, to capture 
and formalize methods, for simulation, and for symbolic analysis. 

This book is the result of teaching classical mechanics at MIT 
for the past six years. The contents of our class began with ideas 
from a class on nonlinear dynamics and solar system dynamics by 
Wisdom and ideas about how computation can be used to formu- 
late methodology developed in the introductory computer science 
class by Abelson and Sussman. When we started we expected that 
using this approach to formulate mechanics would be easy. We 
quickly learned though that there were many things we thought we 
understood that we did not in fact understand. Our requirement 
that our mathematical notations be explicit and precise enough 
so that they can be interpreted automatically, as by a computer, 
is very effective in uncovering puns and flaws in reasoning. The 
resulting struggle to make the mathematics precise, yet clear and 
computationally effective, lasted far longer than we anticipated. 
We learned a great deal about both mechanics and computation 
by this process. We hope others, especially our competitors, will 
adopt these methods that enhance understanding, while slowing 
research. 
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Lagrangian Mechanics 


The purpose of mechanics is to describe how 
bodies change their position in space with “time.” 
I should load my conscience with grave sins against 
the sacred spirit of lucidity were I to formulate the 
aims of mechanics in this way, without serious 
reflection and detailed explanations. Let us 
proceed to disclose these sins. 


Albert Einstein Relativity, the Special and General 
Theory, (1961), p. 9. 


The subject of this book is motion, and the mathematical tools 
used to describe it. 

Centuries of careful observations of the motions of the planets 
revealed regularities in those motions, allowing accurate predic- 
tions of phenomena such as eclipses and conjunctions. The effort 
to formulate these regularities and ultimately to understand them 
led to the development of mathematics and to the discovery that 
mathematics could be effectively used to describe aspects of the 
physical world. That mathematics can be used to describe natural 
phenomena is a remarkable fact. 

When a juggler throws a pin it takes a rather predictable path 
and it rotates in a rather predictable way. In fact, the skill of jug- 
gling depends crucially on this predictability. It is also a remark- 
able discovery that the same mathematical tools used to describe 
the motions of the planets can be used to describe the motion of 
the juggling pin. 

Classical mechanics describes the motion of a system of par- 
ticles, subject to forces describing their interactions. Complex 
physical objects, such as juggling pins, can be modeled as myriad 
particles with fixed spatial relationships maintained by stiff forces 
of interaction. 

There are many conceivable ways a system could move that 
never occur. We can imagine that the juggling pin might pause 
in midair or go fourteen times around the head of the juggler be- 
fore being caught, but these motions do not happen. How can 
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we distinguish motions of a system that can actually occur from 
other conceivable motions? Perhaps we can invent some mathe- 
matical function that allows us to distinguish realizable motions 
from among all conceivable motions. 

The motion of a system can be described by giving the position 
of every piece of the system at each moment. Such a description of 
the motion of the system is called a configuration path; the config- 
uration path specifies the configuration as a function of time. The 
juggling pin rotates as it flies through the air; the configuration of 
the juggling pin is specified by giving the position and orientation 
of the pin. The motion of the juggling pin is specified by giving 
the position and orientation of the pin as a function of time. 

The function that we seek takes a configuration path as an 
input and produces some output. We want this function to have 
some characteristic behavior when the input is a realizable path. 
For example, the output could be a number, and we could try to 
arrange that the number is zero only on realizable paths. Newton’s 
equations of motion are of this form; at each moment Newton’s 
differential equations must be satisfied. 

However, there is a alternate strategy that provides more in- 
sight and power: we could look for a path-distinguishing function 
that has a minimum on the realizable paths—on nearby unreal- 
izable paths the value of the function is higher than it is on the 
realizable path. This is the variational strategy: for each physical 
system we invent a path-distinguishing function that distinguishes 
realizable motions of the system by having a stationary point for 
each realizable path.! For a great variety of systems realizable 
motions of the system can be formulated in terms of a variational 
principle.? 


1A stationary point of a function is a point where the function’s value does not 
vary as the input is varied. Local maxima or minima are stationary points. 


?The variational formulation successfully describes all of the Newtonian me- 
chanics of particles and rigid bodies. The variational formulation has also 
been usefully applied in the description of many other systems such as classi- 
cal electrodynamics, the dynamics of inviscid fluids, and the design of mech- 
anisms such as four-bar linkages. In addition, modern formulations of quan- 
tum mechanics and quantum field theory build on many of the same con- 
cepts. However, the variational formulation does not appear to apply to all 
dynamical systems. For example, there is no simple prescription to apply 
the variational apparatus to systems with dissipation, though in special cases 
variational methods still apply. 


3 


Mechanics, as invented by Newton and his contemporaries, de- 
scribes the motion of a system in terms of the positions, velocities, 
and accelerations of each of the particles in the system. In contrast 
to the Newtonian formulation of mechanics, the variational formu- 
lation of mechanics describes the motion of a system in terms of 
aggregate quantities that are associated with the motion of the 
system as a whole. 

In the Newtonian formulation the forces can often be written 
as derivatives of the potential energy of the system. The motion 
of the system is determined by considering how the individual 
component particles respond to these forces. The Newtonian for- 
mulation of the equations of motion is intrinsically a particle-by- 
particle description. 

In the variational formulation the equations of motion are for- 
mulated in terms of the difference of the kinetic energy and the 
potential energy. The potential energy is a number that is char- 
acteristic of the arrangement of the particles in the system; the 
kinetic energy is a number that is determined by the velocities of 
the particles in the system. Neither the potential energy nor the 
kinetic energy depend on how those positions and velocities are 
specified. The difference is characteristic of the system as a whole 
and does not depend on the details of how the system is specified. 
So we are free to choose ways of describing the system that are 
easy to work with; we are liberated from the particle-by-particle 
description inherent in the Newtonian formulation. 

The variational formulation has numerous advantages over the 
Newtonian formulation. The equations of motion for those param- 
eters that describe the state of the system are derived in the same 
way regardless of the choice of those parameters: the method of 
formulation does not depend on the choice of coordinate system. 
If there are positional constraints among the particles of a system 
the Newtonian formulation requires that we consider the forces 
maintaining these constraints, whereas in the variational formu- 
lation the constraints can be built into the coordinates. The vari- 
ational formulation reveals the association of conservation laws 
with symmetries. The variational formulation provides a frame- 
work for placing any particular motion of a system in the context 
of all possible motions of the system. We pursue the variational 
formulation because of these advantages. 
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1.1 The Principle of Stationary Action 


Let us suppose that for each physical system there is a path- 
distinguishing function that is stationary on realizable paths. We 
will try to deduce some of its properties. 


Experience of motion 

Our ordinary experience suggests that physical motion can be de- 
scribed by configuration paths that are continuous and smooth.® 
We do not see the juggling pin jump from one place to another. 
Nor do we see the juggling pin suddenly change the way it is mov- 
ing. 

Our ordinary experience suggests that the motion of physical 
systems does not depend upon the entire history of the system. 
If we enter the room after the juggling pin has been thrown into 
the air we cannot tell when it left the juggler’s hand. The juggler 
could have thrown the pin from a variety of places at a variety 
of times with the same apparent result as we walk in the door.* 
So the motion of the pin does not depend on the details of the 
history. 

Our ordinary experience suggests that the motion of physical 
systems is deterministic. In fact, a small number of parameters 
summarize the important aspects of the history of the system and 
determine the future evolution of the system. For example, at 
any moment the position, velocity, orientation and rate of change 
of the orientation of the juggling pin are enough to completely 
determine the future motion of the pin. 


Realizable paths 

From our experience of motion we develop certain expectations 
about realizable configuration paths. If a path is realizable, then 
any segment of the path is a realizable path segment. Conversely, 
a path is realizable if every segment of the path is a realizable 


3 Experience with systems on an atomic scale suggests that at this scale systems 
do not travel along well-defined configuration paths. To describe the evolution 
of systems on the atomic scale we employ quantum mechanics. Here, we 
restrict attention to systems for which the motion is well described by a smooth 
configuration path. 


“Extrapolation of the orbit of the Moon backward in time cannot determine 
the point at which the Moon was placed on this trajectory. To determine 
the origin of the Moon we must supplement dynamical evidence with other 
physical evidence such as chemical compositions. 


1.1 The Principle of Stationary Action 5 


path segment. The realizability of a path segment depends on 
all points of the path in the segment. The realizability of a path 
segment depends on every point of the path segment in the same 
way; no part of the path is special. The realizability of a path 
segment depends only on points of the path within the segment; 
the realizability of a path segment is a local property. 

So the path-distinguishing function aggregates some local prop- 
erty of the system measured at each moment along the path seg- 
ment. Each moment along the path must be treated the same way. 
The contributions from each moment along the path segment must 
be combined in a way that maintains the independence of the con- 
tributions from disjoint subsegments. One method of combination 
that satisfies these requirements is to add up the contributions, 
making the path-distinguishing function an integral over the path 
segment of some local property of the path.” 

So we will try to arrange that the path-distinguishing func- 
tion, constructed as an integral of a local property along the path, 
assumes an extreme value for any realizable path. Such a path- 
distinguishing function is traditionally called an action for the 
system. We use the word “action” to be consistent with common 
usage. Perhaps it would be clearer to continue to call it “path- 
distinguishing function,” but then it would be more difficult for 
others to know what we were talking about.° 

In order to pursue the agenda of variational mechanics, we must 
invent action functions that are stationary on the realizable tra- 
jectories of the systems we are studying. We will consider actions 
that are integrals of some local property of the configuration path 
at each moment. Let y be the configuration-path function; y(t) 


>We suspect that this argument can be promoted to a precise constraint on 
the possible ways of making this path-distinguishing function. 


Historically, Huygens was the first to use the term “action” in mechanics. He 
used the term to refer to “the effect of a motion.” This is an idea that came 
from the Greeks. In his manuscript “Dynamica” (1690) Leibnitz enunciated a 
“Least Action Principle” using the “harmless action,” which was the product 
of mass, velocity, and the distance of the motion. Leibnitz also spoke of a 
“violent action” in the case where things collided. 


6 Chapter 1 Lagrangian Mechanics 


is the configuration at time t. The action of the segment of the 
path y in the time interval from tı to t2 is’ 


t2 


S[y|(t1,t2) = f Fl (1.1) 


ty 


where Ffy] is a function of time that measures some local property 
of the path. It may depend upon the value of the function y at 
that time and the value of any derivatives of y at that time.® 

The configuration path can be locally described at a moment in 
terms of the configuration, the rate of change of the configuration, 
and all the higher derivatives of the configuration at the given 
moment. Given this information the path can be reconstructed in 
some interval containing that moment.? Local properties of paths 
can depend on no more than the local description of the path. 

The function F measures some local property of the configura- 
tion path y. We can decompose F|] into two parts: a part that 
measures some property of a local description and a part that ex- 
tracts a local description of the path from the path function. The 
function that measures the local property of the system depends 
on the particular physical system; the method of construction of a 
local description of a path from a path is the same for any system. 
We can write F[7] as a composition of these two functions:!° 


Fiy] = LoTjyl. (1.2) 


TA definite integral of a real-valued function f of a real argument is written 


ane f. This can also be written S? f(x)dx. The first notation emphasizes that 
a function is being integrated. 


STraditionally, square brackets are put around functional arguments. In this 
case, the square brackets remind us that the value of S may depend on the 
function y in complicated ways, such as through its derivatives. 


°In the case of a real-valued function the value of the function and its deriva- 
tives at some point can be used to construct a power series. For sufficiently 
nice functions (real analytic) the power series constructed in this way con- 
verges in some interval containing the point. Not all functions can be locally 
represented in this way. For example, the function f(x) = exp(—1/s?), with 
f(0) = 0, is zero and has all derivatives zero at x = 0, but this infinite number 
of derivatives is insufficient to determine the function value at any other point. 


10 Here o denotes composition of functions: (fog)(t) = f(g(t)). In our notation 
the application of a path-dependent function to its path is of higher precedence 
than the composition, so £ o T [y] = £ o (T[y]J). 
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The function 7 takes the path and produces a function of time. 
Its value is an ordered tuple containing the time, the configuration 
at that time, the rate of change of the configuration at that time, 
and the values of higher derivatives of the path evaluated at that 
time. For the path y and time t:1 


Tilt) = (t, y), Dot), - - -) (1.3) 


We refer to this tuple, which includes as many derivatives as are 
needed, as the local tuple. 

The function £ depends on the specific details of the physical 
system being investigated, but does not depend on any particular 
configuration path. The function £ computes a real-valued local 
property of the path. We will find that £ needs only a finite num- 
ber of components of the local tuple to compute this property: 
The path can be locally reconstructed from the full local descrip- 
tion; that £ depends on a finite number of components of the local 
tuple guarantees that it measures a local property.!? 

The advantage of this decomposition is that the local descrip- 
tion of the path is computed by a uniform process from the con- 
figuration path, independent of the system being considered. All 
of the system-specific information is captured in the function £. 

The function £ is called a Lagrangian? for the system, and the 
resulting action, 


Sil(ti.te) = | LoTHI, (1.4) 


ty 


"The derivative Dy of a configuration path y can be defined in terms of 
ordinary derivatives by specifying how it acts on sufficiently smooth real- 
valued functions f of configurations. The exact definition is unimportant at 
this stage. If you are curious see footnote 23. 


We will later discover that an initial segment of the local tuple will be 
sufficient to determine the future evolution of the system. That a configuration 
and a finite number of derivatives determines the future means that there is 
a way of determining all of the rest of the derivatives of the path from the 
initial segment. 


13The classical Lagrangian plays a fundamental role in the path-integral for- 
mulation of quantum mechanics (due to Dirac and Feynman), where the com- 
plex exponential of the classical action yields the relative probability ampli- 
tude for a path. The Lagrangian is the starting point for the Hamiltonian 
formulation of mechanics (discussed in chapter 3), which is also essential in 
the Schrödinger and Heisenberg formulations of quantum mechanics and in 
the Boltzmann-Gibbs approach to statistical mechanics. 
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is called the Lagrangian action. Lagrangians can be found for a 
great variety of systems. We will see that for many systems the 
Lagrangian can be taken to be the difference between kinetic and 
potential energy. Such Lagrangians depend only on the time, the 
configuration, and the rate of change of the configuration. We will 
focus on this class of systems, but will also consider more general 
systems from time to time. 

A realizable path of the system is to be distinguished from oth- 
ers by having stationary action with respect to some set of nearby 
unrealizable paths. Now some paths near realizable paths will 
also be realizable: for any motion of the juggling pin there is an- 
other that is slightly different. So when addressing the question 
of whether the action is stationary with respect to variations of 
the path we must somehow restrict the set of paths we are con- 
sidering to contain only one realizable path. It will turn out that 
for Lagrangians that depend only on the configuration and rate 
of change of configuration it is enough to restrict the set of paths 
to those that have the same configuration at the endpoints of the 
path segment. 

The Principle of Stationary Action asserts that for each dy- 
namical system we can cook up a Lagrangian such that a realizable 
path connecting the configurations at two times tı and tə is dis- 
tinguished from all conceivable paths by the fact that the action 
S|7]|(t1, t2) is stationary with respect to variations of the path. 
For Lagrangians that depend only on the configuration and rate 
of change of configuration the variations are restricted to those 
that preserve the configurations at tı and to. 


14The principle is often called the “Principle of Least Action” because its 
initial formulations spoke in terms of the action being minimized rather than 
the more general case of taking on a stationary value. The term “Principle of 
Least Action” is also commonly used to refer to a result, due to Maupertuis, 
Euler, and Lagrange, which says that free particles move along paths for which 
the integral of the kinetic energy is minimized among all paths with the given 
endpoints. Correspondingly, the term “action” is sometimes used to refer 
specifically to the integral of the kinetic energy. (Actually, Euler and Lagrange 
used the vis viva, or twice the kinetic energy.) 


Other ways of stating the principle of stationary action make it sound teleo- 
logical and mysterious. For instance, one could imagine that the system con- 
siders all possible paths from its initial configuration to its final configuration 
and then chooses the one with the smallest action. Indeed, the underlying vi- 
sion of a purposeful, economical, and rational universe played no small part in 
the philosophical considerations that accompanied the initial development of 
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Exercise 1.1: Fermat optics 


Fermat observed that the laws of reflection and refraction could be ac- 
counted for by the following facts: Light travels in a straight line in any 
particular medium with a velocity that depends upon the medium. The 
path taken by a ray from a source to a destination through any sequence 
of media is a path of least total time, compared to neighboring paths. 
Show that these facts do imply the laws of reflection and refraction. 16 


1.2 Configuration Spaces 


Let us consider mechanical systems that can be thought of as 
composed of constituent point particles, with mass and position, 
but with no internal structure.!” Extended bodies may be thought 
of as composed of a large number of these constituent particles 
with specific spatial relationships between them. Extended bodies 
maintain their shape because of spatial constraints between the 
constituent particles. Specifying the position of all the constituent 
particles of a system specifies the configuration of the system. The 
existence of constraints between parts of the system, such as those 
that determine the shape of an extended body, means that the 
constituent particles cannot assume all possible positions. The 
set of all configurations of the system that can be assumed is 
called the configuration space of the system. The dimension of the 


mechanics. The earliest action principle that remains part of modern physics is 
Fermat’s Principle, which states that the path traveled by a light ray between 
two points is the path that takes the least amount of time. Fermat formu- 
lated this principle around 1660 and used it to derive the laws of reflection 
and refraction. Motivated by this, the French mathematician and astronomer 
Pierre-Louis Moreau de Maupertuis enunciated the Principle of Least Action 
as a grand unifying principle in physics. In his Essai de cosmologie (1750) 
Maupertuis appealed to this principle of “economy in nature” as evidence of 
the existence of God, asserting that it demonstrated “God’s intention to regu- 
late physical phenomena by a general principle of the highest perfection.” For 
a historical perspective of Maupertuis’s, Euler’s, and Lagrange’s roles in the 
formulation of the principle of least action, see Jourdain [25]. 


16For reflection the angle of incidence is equal to the angle of reflection. Re- 
fraction is described by Snell’s law. Snell’s Law is that when light passes from 
one medium to another, the ratio of the sines of the angles made to the normal 
to the interface is the inverse of the ratio of the refractive indices of the media. 
The refractive index is the ratio of the speed of light in the vacuum to the 
speed of light in the medium. 


"We often refer to a point particle with mass but no internal structure as a 
point mass. 
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configuration space is the smallest number of parameters that have 
to be given to completely specify a configuration. The dimension 
of the configuration space is also called the number of degrees of 
freedom of the system.'8 

For a single unconstrained particle it takes three parameters to 
specify the configuration. Thus the configuration space of a point 
particle is three dimensional. If we are dealing with a system with 
more than one point particle, the configuration space is more com- 
plicated. If there are k separate particles we need 3k parameters 
to describe the possible configurations. If there are constraints 
among the parts of a system the configuration is restricted to a 
lower-dimensional space. For example, a system consisting of two 
point particles constrained to move in three dimensions so that the 
distance between the particles remains fixed has a five-dimensional 
configuration space: for example, with three numbers we can fix 
the position of one particle, and with two others we can give the 
position of the other particle relative to the first. 

Consider a juggling pin. The configuration of the pin is specified 
if we give the positions of every atom making up the pin. However, 
there exist more economical descriptions of the configuration. In 
the idealization that the juggling pin is truly rigid, the distances 
among all the atoms of the pin remain constant. So we can specify 
the configuration of the pin by giving the position of a single atom 
and the orientation of the pin. Using the constraints, the positions 
of all the other constituents of the pin can be determined from 
this information. The dimension of the configuration space of 
the juggling pin is six: the minimum number of parameters that 
specify the position in space is three, and the minimum number 
of parameters that specify an orientation is also three. 

As a system evolves with time, the constituent particles move 
subject to the constraints. The motion of each constituent particle 


'8Strictly speaking the dimension of the configuration space and the number 
of degrees of freedom are not the same. The number of degrees of freedom is 
the dimension of the space of configurations that are “locally accessible.” For 
systems with integrable constraints the two are the same. For systems with 
non-integrable constraints the configuration dimension can be larger than the 
number of degrees of freedom. For further explanation see the discussion of 
systems with non-integrable constraints below (section 1.10.3). Apart from 
that discussion, all of the systems we will consider have integrable constraints 
(they are “holonomic”). This is why we have chosen to blur the distinction be- 
tween the number of degrees of freedom and the dimension of the configuration 
space. 
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is specified by describing the changing configuration. Thus, the 
motion of the system may be described as evolving along a path 
in configuration space. The configuration path may be specified 
by a function, the configuration-path function, which gives the 
configuration of the system at any time. 


Exercise 1.2: Degrees of freedom 

For each of the mechanical systems described below, give the number of 
degrees of freedom of the configuration space. 

a. Three juggling pins. 


b. A spherical pendulum, consisting of a point mass hanging from a 
rigid massless rod attached to a fixed support point. The pendulum 
bob may move in any direction subject to the constraint imposed by the 
rigid rod. The point mass is subject to the uniform force of gravity. 

c. A spherical double pendulum, consisting of one point-mass hanging 
from a rigid massless rod attached to a second point-mass hanging from 
a second massless rod attached to a fixed support point. The point mass 
is subject to the uniform force of gravity. 


d. A point mass sliding without friction on a rigid curved wire. 


e. A top consisting of a rigid axisymmetric body with one point on the 
symmetry axis of the body attached to a fixed support, subject to a 
uniform gravitational force. 


f. The same as e, but not axisymmetric. 


1.3 Generalized Coordinates 


In order to be able to talk about specific configurations we need to 
have a set of parameters that label the configurations. The param- 
eters that are used to specify the configuration of the system are 
called the generalized coordinates. Consider an unconstrained free 
particle. The configuration of the particle is specified by giving 
its position. This requires three parameters. The unconstrained 
particle has three degrees of freedom. One way to specify the po- 
sition of a particle is to specify its rectangular coordinates relative 
to some chosen coordinate axes. The rectangular components of 
the position are generalized coordinates for an unconstrained par- 
ticle. Or consider an ideal planar double pendulum: a point mass 
constrained to always be a given distance from a fixed point by a 
rigid rod, with a second mass that is constrained to be at a given 
distance from the first mass by another rigid rod, all confined to a 
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vertical plane. The configuration is specified if the orientation of 
the two rods is given. This requires at least two parameters; the 
planar double pendulum has two degrees of freedom. One way to 
specify the orientation of each rod is to specify the angle it makes 
with the vertical. These two angles are generalized coordinates 
for the planar double pendulum. 

The number of coordinates need not be the same as the dimen- 
sion of the configuration space, though there must be at least that 
many. We may choose to work with more parameters than neces- 
sary, but then the parameters will be subject to constraints that 
restrict the system to possible configurations, that is, to elements 
of the configuration space. 

For the planar double pendulum described above, the two angle 
coordinates are enough to specify the configuration. We could 
also take as generalized coordinates the rectangular coordinates of 
each of the masses in the plane, relative to some chosen coordinate 
axes. These are also fine coordinates, but we will have to explicitly 
keep in mind the constraints that limit the possible configurations 
to the actual geometry of the system. Sets of coordinates with 
the same dimension as the configuration space are easier to work 
with because we do not have to deal with explicit constraints 
among the coordinates. So for the time being we will consider 
only formulations where the number of configuration coordinates 
is equal to the number of degrees of freedom; later we will learn 
how to handle systems with redundant coordinates and explicit 
constraints. 

In general, the configurations form a space M of some dimen- 
sion n. The n-dimensional configuration space can be parametrized 
by choosing a coordinate function x that maps elements of the 
configuration space to n-tuples of real numbers. If there is more 
than one dimension, the function y is a tuple of n independent 
coordinate functions! y’, i = 0,...,n — 1, where each x’ is a 
real-valued function defined on some region of the configuration 
space.”? For a given configuration m in the configuration space M 


19A tuple of functions that all have the same domain is itself a function on 
that domain: Given a point in the domain the value of the tuple of functions 
is a tuple of the values of the component functions at that point. 


?0The use of superscripts to index the coordinate components is traditional, 
even though there is potential confusion, say, with exponents. We use zero- 
based indexing. 
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the values y'(m) of the coordinate functions are the generalized 
coordinates of the configuration. These generalized coordinates 
permit us to identify points of the n-dimensional configuration 
space with n-tuples of real numbers.”! For any given configura- 
tion space, there are a great variety of ways to choose generalized 
coordinates. Even for a single point moving without constraints, 
we can choose rectangular coordinates, polar coordinates, or any 
other coordinate system that strikes our fancy. 

The motion of the system can be described by a configuration 
path y mapping time to configuration-space points. Correspond- 
ing to the configuration path is a coordinate path q = xoy mapping 
time to tuples of generalized coordinates. If there is more than 
one degree of freedom the coordinate path is a structured object: 
q is a tuple of component coordinate path functions q’ = y' o y. 
At each instant of time t, the values q(t) = (q°(t),...,¢"1(t)) are 
the generalized coordinates of a configuration. 

The derivative Dq of the coordinate path q is a function?” that 
gives the rate of change of the configuration coordinates at a given 
time: Dq(t) = (Dq(t),...,Dq"\(t)). The rate of change of a 
generalized coordinate is called a generalized velocity. 

We can make coordinate representations for higher derivatives 
of the path as well. We introduce the function Œ (pronounced 


21 More precisely, the generalized coordinates identify open subsets of the con- 
figuration space with open subsets of R”. It may require more than one set of 
generalized coordinates to cover the entire configuration space. For example, 
if the configuration space is a two-dimensional sphere, we could have one set 
of coordinates that maps (a little more than) the northern hemisphere to a 
disk, and another set that maps (a little more than) the southern hemisphere 
to a disk, with a strip near the equator common to both coordinate systems. 
A space that can be locally parametrized by smooth coordinate functions is 
called a differentiable manifold. The theory of differentiable manifolds can be 
used to formulate a coordinate-free treatment of variational mechanics. An 
introduction to mechanics from this perspective can be found in [2] or [5] . 


2? The derivative of a function f is a function. It is denoted Df. Our notational 
convention is that D is a high-precedence operator. Thus D operates on the 
adjacent function before any other application occurs: Df(x) is the same as 


(Df)(«). 
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“chart” ) that extends a coordinate representation to the local tu- 
23 
ple: 


x(t, q(t), Dy(t), os .) = (t, q(t), Dq(t), re .) D (1.5) 


where q = xo y. The function Ħ, takes the coordinate-free local 
tuple (t, y(t), Dy(t),...) and gives a coordinate representation as 
a tuple of the time, the value of the coordinate path function at 
that time, and the values of as many derivatives of the coordinate 
path function as are needed. 

Given a coordinate path q = xoy the rest of the local tuple can 
be computed from it. We introduce a function I that does this 


Vial) = @, a), Dalt), ---)- (1.6) 


The evaluation of I only involves taking derivatives of the coordi- 
nate path q = x o 7; the function I does not depend on y. From 
relations (1.5) and (1.6) we find 


Tig) =#, o 7h]. (1.7) 


Exercise 1.3: Generalized coordinates 


For each of the systems described in exercise 1.2 specify a system of 
generalized coordinates that can be used to describe the behavior of the 
system. 


Lagrangians in generalized coordinates 

The action is a property of a configuration path segment for a 
particular Lagrangian £. The action does not depend on the co- 
ordinate system that is used to label the configurations. We can 
use this property to find a coordinate representation L, for the 
Lagrangian £. 


?3The formal definition of # is unimportant to the discussion, but if you really 
want to know here is one way to do it: 

First, we define the derivative Dy of a configuration path y in terms of 
ordinary derivatives by specifying how it acts on sufficiently smooth real- 
valued functions f of configurations: (D”y)(t)(f) = D”(f o y)(t). Then we 
define H, (a, b,c, d,. = (a, a e ..). With this definition: 


x(t, y(t), Dy(t), D?-y( = (t,x Dy(t)(x),D*y(t)(x), --) 
ten D(x o )(t), D?(x 0 7)(t), ---) 
(t, q(t t), D’), .) 


1.8 Generalized Coordinates 15 


The action is 


Stitts,t) = Larni (1.8) 


ty 


The Lagrangian £ is a function of the local tuple T[y|(t) = 
(t, y(t), Dy(t),...). The local tuple has the coordinate represen- 


tation ['[q] =Œ, o Tfh], where q = x o y. So if we choose*4 

Di Leta Ss (1.9) 
then? 

Ly oT |q] = Lo Tf]. (1.10) 


On the left we have the composition of functions that use the 
intermediary of a coordinate representation; on the right we have 
the composition of two functions that do not involve coordinates. 
We define the coordinate representation of the action to be 


Sdt) = f Exora (1.11) 


tı 


The function S% takes a coordinate path; the function S takes a 
configuration path. Since the integrands are the same by equa- 
tion (1.10) the integrals have the same value: 


S[7](t1, t2) = Sxlx o 7] (ti, t2). (1.12) 


So we have a way of constructing coordinate representations of a 
Lagrangian that gives the same action for a path in any coordinate 
system. 

For Lagrangians that depend only on positions and velocities 
the action can also be written 


Sdan ta) = f Ly (talt), DAG) dt (1.13) 


tı 


?4The coordinate function y is locally invertible, and so is #,. 


5L o Th] = Lo Hz ofl, o Th] = Ly ol [yoy] = Ly 0 Tla]. 
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The coordinate system used in the definition of a Lagrangian or 
an action is usually unambiguous, so the subscript x will usually 
be dropped. 


1.4 Computing Actions 


To illustrate the above ideas, and to introduce their formulation as 
computer programs, we consider the simplest mechanical system— 
a free particle moving in three dimensions. Euler and Lagrange 
discovered that for a free particle the time-integral of the kinetic 
energy over the particle’s actual path is smaller than the same 
integral along any alternative path between the same points: a 
free particle moves according to the principle of stationary action, 
provided we take the Lagrangian to be the kinetic energy. The ki- 
netic energy for a particle of mass m and velocity V is smv", where 
v is the magnitude of v. In this case we can choose the generalized 
coordinates to be the ordinary rectangular coordinates. 

Following Euler and Lagrange, the Lagrangian for the free par- 
ticle is? 


L(t, 2,v) = ġm(v - v), (1.14) 


where the formal parameter x names a tuple of components of 
the position with respect to a given rectangular coordinate sys- 
tem, and where the formal parameter v names a tuple of velocity 
components.?’ 

We can express this formula as a procedure: 


?6Here we are making a function definition. A definition specifies the value 
of the function for arbitrarily chosen formal parameters. One may change 
the name of a formal parameter, so long as the new name does not conflict 
with any other symbol in the definition. For example, the following definition 
specifies exactly the same free-particle Lagrangian: 


L(a, b,c) = $m(c-c). 


27 The Lagrangian is formally a function of the local tuple, but any particular 
Lagrangian only depends on a finite initial segment of the local tuple. We 
define functions of local tuples by explicitly declaring names for the elements 
of the initial segment of the local tuple that includes the elements upon which 
the function depends. 
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(define ((L-free-particle mass) local) 
(let ((v (velocity local))) 
(* 1/2 mass (dot-product v v)))) 


The definition indicates that L-free-particle is a procedure that 
takes mass as an argument and returns a procedure that takes 
a local tuple local,?® extracts the generalized velocity with the 
procedure velocity, and uses the velocity to compute the value 
of the Lagrangian. 

Suppose we let q denote a coordinate path function that maps 
time to position components:2? 


q(t) = (x(t), y(t), 2(t)) - (1.15) 
We can make this definition? 


(define q 
(up (literal-function ’x) 
(literal-function ’y) 
(literal-function ’z))) 


where literal-function makes a procedure that represents a 
function of one argument that has no known properties other than 
the given symbolic name.*! The symbol q now names a procedure 


28 We represent the local tuple as a composite data structure, the components 
of which are the time, the generalized coordinates, the generalized velocities, 
and possibly higher derivatives. We do not want to be bothered by the details 
of packing and unpacking the components into these structures, so we provide 
utilities for doing this. The constructor ->local takes the time, the coor- 
dinates, and the velocities and returns a data structure representing a local 
tuple. The selectors time, coordinate, and velocity extract the appropri- 
ate pieces from the local structure. The procedures time = (component 0), 
coordinate = (component 1) and velocity = (component 2). 


?°Be careful. The x in the definition of q is not the same as the x that was used 
as a formal parameter in the definition of the free-particle Lagrangian above. 
There are only so many letters in the alphabet, so we are forced to reuse them. 
We will be careful to indicate where symbols are given new meanings. 


30A tuple of coordinate or velocity components is made with the procedure 
up. Component i of the tuple q is (ref q i). All indexing is zero based. The 
word up is to remind us that in mathematical notation these components are 
indexed by superscripts. There are also down tuples of components that are 
indexed by subscripts. See the appendix on notation. 


31Tn our system, arithmetic operators are generic over symbols and expressions 
as well as numeric values; so arithmetic procedures can work uniformly with 
numbers or expressions. For example, if we have the procedure (define (cube 
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of one real argument (time) that produces a tuple of three com- 
ponents representing the coordinates at that time. For example, 
we can evaluate this procedure for a symbolic time t as follows: 


(print-expression (q ’t)) 
(up (x t) (y t) (z t)) 


The procedure print-expression produces a printable form of 
the expression. The procedure print-expression simplifies ex- 
pressions before printing them. 

The derivative of the coordinate path Dq is the function that 
maps time to velocity components: 


Da(t) = (Da(t), Dy(t), Da(t)). 


We can make and use the derivative of a function.’ For example, 
we can write: 


(print-expression ((D q) ’t)) 
(up ((D x) t) ((D y) t) ((D z) t)) 


The function T takes a coordinate path and returns a function of 
time that gives the local tuple (t, q(t), Dq(t),...). We implement 
this I with the procedure Gamma. Here is what Gamma does: 


(print-expression ((Gamma q) ’t)) 
(up t 
(up (x t) (y t) (z t)) 
(up ((D x) t) ((D y) t) ((D z) t))) 


So the composition L oT is a function of time that returns the 
value of the Lagrangian for this point on the path: 


(print-expression 

((compose (L-free-particle ’m) (Gamma q)) ’t)) 
(+ (* 1/2 m (expt ((D x) t) 2)) 

(* 1/2 m (expt ((D y) t) 2)) 

(* 1/2 m (expt ((D z) t) 2))) 


x) (* x x x)) we can obtain its value for a number (cube 2) => 8 or fora 
literal symbol (cube ’a) => (* a a a). 


32Derivatives of functions yield functions. For example, ((D cube) 2) => 12 
and ((D cube) ’a) => (* 3 (expt a 2)). 
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The procedure show-expression is like print-expression except 
that it puts the simplified expression into traditional infix form 
and displays the result.?? Most of the time we will use this method 
of display, to make the boxed expressions that appear in this book. 
It also produces the prefix form as returned by print-expression, 
but we will usually not show this.?4 


(show-expression 
((compose (L-free-particle ’m) (Gamma q)) ’t)) 


sm (De (t))? + sn (Dy (yr smn (Dz (t))° 


According to equation (1.11) we can compute the Lagrangian 
action from time tı to time tz as: 


(define (Lagrangian-action L q t1 t2) 
(definite-integral (compose L (Gamma q)) t1 t2)) 


Lagrangian-action takes as arguments a procedure L that com- 
putes the Lagrangian, a procedure q that computes a coordinate 
path, and starting and ending times t1 and t2. The definite- 
integral used here takes as arguments a function and two lim- 
its t1 and t2, and computes the definite integral of the function 
over the interval from t1 to t2.°° Notice that the definition of 
Lagrangian-action does not depend on any particular set of co- 
ordinates or even the dimension of the configuration space. The 
method of computing the action from the coordinate representa- 
tion of a Lagrangian and a coordinate path does not depend on 
the coordinate system. 

We can now compute the action for the free particle along a 
path. For example, consider a particle moving at uniform speed 


33The display is generated with TFX. 


34For very complicated expressions the prefix notation of Scheme is often bet- 
ter, but simplification is almost always useful. We can separate the functions 
of simplification and infix display. We will see examples of this later. 


%>Scmutils includes a variety of numerical integration procedures. The ex- 
amples in this section were computed by rational-function extrapolation of 
Euler-MacLaurin formulas with a relative error tolerance of 1077. 
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along a straight line t > (4¢+7,3t+5,2t+1).°° We represent 
the path as a procedure 


(define (test-path t) 
Cup (+ (* 4 t) 7) 
(+ (* 3 t) 5) 

(+ (x 2 t) 1))) 


For a particle of mass 3, we obtain the action between t = 0 and 
t = 10 as?” 


(Lagrangian-action (L-free-particle 3.0) test-path 0.0 10.0) 
435. 


Exercise 1.4: Lagrangian actions 


For a free particle an appropriate Lagrangian is?® 


L(t, x, v) = 5mv?. 


Suppose that x is the constant-velocity straight-line path of a free par- 
ticle, such that £a = x(t,) and x») = x(t,). Show that the action on the 
solution path is 


M (xp — La)? 


2 ta—ta 


Paths of minimum action 

We already know that the actual path of a free particle is uniform 
motion in a straight line. According to Euler and Lagrange the 
action is smaller along a straight-line test path than along nearby 
paths. Let q be a straight-line test path with action S[q](t1, t2). 
Let q + 7 be a nearby path, obtained from q by adding a path 


36Surely for a real physical situation we would have to specify units for these 
quantities. In this illustration we do not give units. 


37Here we use decimal numerals to specify the parameters. This forces the 
representations to be floating point, which is efficient for numerical calculation. 
If symbolic algebra is to be done it is essential that the numbers be exact 
integers or rational fractions, so that expressions can be reliably reduced to 
lowest terms. Such numbers are specified without a decimal point. 


38The squared magnitude of the velocity is g- 7, the vector dot-product of 

the velocity with itself. The square of a structure of components is defined to 

be the sum of the squares of the individual components, so we write simply 
2 

v =U-U. 
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variation 7 scaled by the real parameter ¢.2? The action on the 
varied path is S[q + en|(ti,t2). Euler and Lagrange found S|q + 
en|(t1, t2) > S{q|(t1, t2) for any 7 that is zero at the endpoints and 
for any small non-zero e. 

Let’s check this numerically by varying the test path, adding 
some amount of a test function that is zero at the endpoints t = tı 
and t = t2. To make a function 7 that is zero at the endpoints, 
given a sufficiently well-behaved function v, we can use 7(t) = 
(t — t1)(t — t)v(t). This can be implemented: 


(define ((make-eta nu t1 t2) t) 
(* (- t t1) (- t t2) (mu t))) 


We can use this to compute the action for a free particle over a 
path varied from the given path, as a function of e:4° 


(define ((varied-free-particle-action mass q nu t1 t2) epsilon) 
(let ((eta (make-eta nu t1 t2))) 
(Lagrangian-action (L-free-particle mass) 
(+ q (* epsilon eta)) 
t1 
t2))) 


The action for the varied path, with v(t) = (sint, cost, t°), and 
€ = 0.001 is, as expected, larger than for the test path: 


((varied-free-particle-action 3.0 test-path 
(up sin cos square) 
0.0 10.0) 

0.001) 

436.29121428571153 


3°Note that we are doing arithmetic on functions. We extend the arithmetic 
operations so that the combination of two functions of the same type (same 
domains and ranges) is the function on the same domain that combines the 
values of the argument functions in the range. For example, if f and g are 
functions of t, then fg is the function t + f(t)g(t). A constant multiple of 
a function is the function whose value is the constant times the value of the 
function for each argument: cf is the function t +> cf(t). 


40Note that we are adding procedures. Paralleling our extension of arithmetic 
operations to functions, arithmetic operations are extended to compatible pro- 
cedures. 
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We can numerically compute the value of e for which the action 
is minimized. We search between, say —2 and 1:4! 
(minimize 
(varied-free-particle-action 3.0 test-path 
(up sin cos square) 
0.0 10.0) 


-2.0 1.0) 
(-1.5987211554602254e-14 435.0000000000237 5) 


We find exactly what is expected—that the best value for e is 
zero,** and the minimum value of the action is the action along 


the straight path. 


Finding trajectories that minimize the action 
We have used the variational principle to determine if a given 
trajectory is realizable. We can also use the variational princi- 
ple to actually find trajectories. Given a set of trajectories that 
are specified by a finite number of parameters, we can search the 
parameter space looking for the trajectory in the set that best ap- 
proximates the real trajectory by finding one that minimizes the 
action. By choosing a good set of approximating functions we can 
get arbitrarily close to the real trajectory.*® 

One way to make a parametric path that has fixed endpoints 
is to use a polynomial that goes through the endpoints as well 
as a number of intermediate points. Variation of the positions 
of the intermediate points varies the path; the parameters of the 
varied path are the coordinates of the intermediate positions. The 
procedure make-path constructs such a path using a Lagrange 


“The arguments to minimize are a procedure implementing the univariate 
function in question, and the lower and upper bounds of the region to be 
searched. Scmutils includes a choice of methods for numerical minimization; 
the one used here is Brent’s algorithm, with an error tolerance of 10~°. The 
value returned by minimize is a list of 3 numbers: the first is the argument 
at which the minimum occurred, the second is the minimum obtained, and 
the third is the number of iterations of the minimization algorithm required 
to obtain the minimum. 


Ves, -1.5987211554602254e-14 is zero for the tolerance required of the min- 
imizer. And the 435.0000000000237 is arguably the same as 435 obtained 
before. 


“3-There are lots of good ways to make such a parametric set of approximating 
trajectories. One could use splines or higher-order interpolating polynomials; 
one could use Chebyshev polynomials; one could use Fourier components. The 
choice depends upon the kinds of trajectories one wants to approximate. 
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interpolation polynomial.44 The procedure make-path is called 
with five arguments: (make-path tO q0 t1 q1 qs), where q0 and 
qi are the endpoints, tO and t1 are the corresponding times, and 
qs is a list of intermediate points. 

Having specified a parametric path we can construct a paramet- 
ric action that is just the action computed along the parametric 
path: 


(define ((parametric-path-action Lagrangian tO q0 t1 q1) qs) 
(let ((path (make-path tO q0 ti qi qs))) 
(Lagrangian-action Lagrangian path tO t1)))) 


We can find approximate solution paths by finding parameters 
that minimize the action. We do this minimization with a canned 
multidimensional minimization procedure:*° 


(define (find-path Lagrangian tO q0 t1 qi n) 
(let ((initial-qs (linear-interpolants q0 q1 n))) 
(let ((minimizing-qs 
(multidimensional-minimize 
(parametric-path-action Lagrangian tO q0 t1 q1) 
initial-qs))) 
(make-path tO q0 t1 q1 minimizing-qs)))) 


“4Here is one way to implement make-path: 


(define (make-path tO q0 t1 q1 qs) 
(let ((n (length qs))) 
(let ((ts (linear-interpolants tO t1 n))) 
(Lagrange-interpolation-function 
(append (list q0) qs (list q1)) 
(append (list t0) ts (list t1)))))) 


The procedure linear-interpolants produces a list of elements that linearly 
interpolate the first two arguments. We use this procedure here to specify ts, 
the n evenly spaced intermediate times between t0 and t1 at which the path 
will be specified. The parameters being adjusted, qs, are the positions at these 
intermediate times. The procedure Lagrange-interpolation-function takes 
a list of values and a list of times and produces a procedure that computes 
the Lagrange interpolation polynomial that goes through these points. 


“The minimizer used here is the Nelder-Mead downhill simplex method. As 
usual with numerical procedures, the interface to the nelder-mead procedure 
is complex, with lots of optional parameters to allow the user to control errors 
effectively. For this presentation we have specialized nelder-mead by wrapping 
it in the more palatable multidimensional-minimize. Unfortunately, you will 
have to learn to live with complicated numerical procedures someday. 
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The procedure multidimensional-minimize takes a procedure (in 
this case the value of the call to action-on-parametric-path) that 
computes the function to be minimized (in this case the action) 
and an initial guess for the parameters. Here we choose the initial 
guess to be equally-spaced points on a straight line between the 
two endpoints, computed with linear-interpolants. 

To illustrate the use of this strategy, we will find trajectories of 
the harmonic oscillator, with Lagrangian*® 


L(t,q, v) = imi? — ikg’, (1.16) 


for mass m and spring constant k. This Lagrangian is imple- 
mented by 


(define ((L-harmonic m k) local) 
(let ((q (coordinate local)) 
(v (velocity local))) 
(- (* 1/2 m (square v)) (* 1/2 k (square q))))) 


We can find an approximate path taken by the harmonic oscil- 
lator for m = 1 and k = 1 between q(0) = 1 and q(7/2) = 0 as 
follows:47 


(define q (find-path (L-Harmonic 1.0 1.0) 0. 1. :pi/2 0. 3)) 


We know that the trajectories of this harmonic oscillator, for 
m = 1l and k = 1, are 


q(t) = Acos(t + p) (1.17) 


where the amplitude A and the phase y are determined by the 
initial conditions. For the chosen endpoint conditions the solution 
is q(t) = cos(t). The approximate path should be an approxima- 
tion to cosine over the range from 0 to 7/2. Figure 1.1 shows the 
error in the polynomial approximation produced by this process. 
The maximum error in the approximation with three intermedi- 
ate points is less than 1.7 x 1074. We find, as expected, that the 
error in the approximation decreases as the number of intermedi- 


46Don’t worry. We know that you don’t yet know why this is the right La- 
grangian. We will get to this in section 1.6. 


47By convention, named constants have names that begin with colon. The 
constants named :pi and :-pi are what we would expect from their names. 
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+0.0002 


-0.0002 
0 1/4 m/2 


Figure 1.1 The difference between the polynomial approximation 
with minimum action and the actual trajectory taken by the harmonic 
oscillator. The abscissa is the time and the ordinate is the error. 


ate points is increased. For four intermediate points it is about a 
factor of 15 better. 


Exercise 1.5: Solution process 


We can watch the progress of the minimization by modifying the proce- 
dure parametric-path-action to plot the path each time the action is 
computed. Try this: 


(define win2 (frame 0. :pi/2 0. 1.2)) 


(define ((parametric-path-action Lagrangian t0 q0 t1 q1) 
intermediate-qs) 
(let ((path (make-path tO q0 t1 qi intermediate-qs) )) 
3; display path 
(graphics-clear win2) 
(plot-function win2 path tO t1 (/ (- ti t0) 100)) 
33 compute action 
(Lagrangian-action Lagrangian path tO t1))) 


(find-path (L-harmonic 1. 1.) 0. 1. :pi/2 0. 2) 


Exercise 1.6: Minimizing action 


Suppose we try to obtain a path by minimizing an action for an im- 
possible problem. For example, suppose we have a free particle and we 
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impose endpoint conditions on the velocities as well as the positions that 
are inconsistent with the particle being free. Does the formalism protect 
itself from such an unpleasant attack? You may find it illuminating to 
program it and see what happens. 


1.5 The Euler-Lagrange Equations 


The principle of stationary action characterizes the realizable 
paths of systems in configuration space as those for which the 
action has a stationary value. In elementary calculus, we learn 
that the critical points of a function are the points where the 
derivative vanishes. In an analogous way, the paths along which 
the action is stationary are solutions of a system of differential 
equations. This system, called the Euler-Lagrange equations or 
just the Lagrange equations, is the link that permits us to use 
the principle of stationary action to compute the motions of me- 
chanical systems, and to relate the variational and Newtonian 
formulations of mechanics.*® 


Lagrange equations 

We will find that if L is a Lagrangian for a system that depends 
on time, coordinates, and velocities, and if g is a coordinate path 
for which the action S{q](t1, t2) is stationary (with respect to any 
variation in the path that keeps the endpoints of the path fixed) 
then 


D(L oT [q]) — ðL oT|q] = 0. (1.18) 


Here L is a real-valued function of a local tuple; ôL and ôL 
denote the partial derivatives of L with respect to its general- 
ized position and generalized velocity arguments.4? The function 
L maps a local tuple to a structure whose components are the 
derivatives of L with respect to each component of the gener- 
alized velocity. The function [gq] maps time to the local tuple: 
T'[q|(t) = (t, g(t), Da(t),...). Thus the compositions 0; LoT |g] and 


48 This result was initially discovered by Euler and later rederived by Lagrange. 


49The derivative or partial derivative of a function that takes structured argu- 
ments is a new function that takes the same number and type of arguments. 
The range of this new function is itself a structure with the same number of 
components as the argument with respect to which the function is differenti- 
ated. 
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2 LoT |q] are functions of one argument, time. The Lagrange equa- 
tions assert that the derivative of 02L0T|q] is equal to 0, Lol fq], 
at any time. Given a Lagrangian, the Lagrange equations form a 
system of ordinary differential equations that must be satisfied by 
realizable paths.°° 


1.5.1 Derivation of the Lagrange Equations 


We will show that Principle of Stationary Action implies that 
realizable paths satisfy a set of ordinary differential equations. 
First we will develop tools for investigating how path-dependent 
functions vary as the paths are varied. We will then apply these 
tools to the action, to derive the Lagrange equations. 


Varying a path 

Suppose that we have a function f[q] that depends on a path q. 
How does the function vary as the path is varied? Let q be a 
coordinate path and q + e7 be a varied path, where the function 
7 is a path-like function that can be added to the path q, and the 
factor e is a scale factor. We define the variation 5, f|q| of the 
function f on the path q by”! 


feio fla ; (1.19) 


€ 


ôn f [q] = lim ( 


50Lagrange’s equations are traditionally written in the form 


dob OL _ 
dt 0g  Oq ” 
or, if we write a separate equation for each component of q, as 
doL OL 

= =0,. -1 
U Oe E EER 


In this way of writing Lagrange’s equations the notation does not distinguish 
between L, which is a real-valued function of three variables (t,q,q), and Lo 
T[q], which is a real-valued function of one real variable t. If we do not realize 
this notational pun, the equations don’t make sense as written—OL/0q is a 
function of three variables, so we must regard the arguments q, q as functions 
of t before taking d/dt of the expression. Similarly, ƏL/ðq is a function of 
three variables, which we must view as a function of t before setting it equal 
to d/dt(OL/0q). These implicit applications of the chain rule pose no problem 
in performing hand computations—once you understand what the equations 
represent. 


51The variation operator 6, is like the derivative operator in that it acts on 
the immediately following function: ôn f[q] = (6nf)[q]- 
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The variation of f is a linear approximation to the change in the 
function f for small variations in the path. The variation of f 
depends on 1. 

A simple example is the variation of the identity path function: 
Ijq] = q. Applying the definition 


5,1|q| = lim (+29) =n. (1.20) 


e—0 € 


It is traditional to write 6,J[q] simply as ôq. Another example is 
the variation of the path function that returns the derivative of 
the path. We have 


(7% +n) = Dq 


ôngla] = lim 


e—0 


z ) = Dn with glg] = Dq. (1.21) 


It is traditional to write d,g[q] as ôDq. 
The variation may be represented in terms of a derivative. Let 
gle) = fla + en], then 


e—0 


bn fldq] = lim (20) = Dg(0). (1.22) 


Variations have the following derivative-like properties. For 
path-dependent functions f and g and constant c: 


ôn(f g)lal = on fla] gla] + flal natal (1.23) 
bn(f + 9)la] = ôn fla] + dnglal (1.24) 
6n(cf) [a] = c ôn flal. (1.25) 


Let F be a path-independent function and let g be a path-dependent 
function, then 


dnhlq] = (DF o glq]) ôngla] with Alg] = F 0 glg]. (1.26) 


The operators D (differentiation) and ô (variation) commute in 
the following sense: 


Dôn fla] = onglq| with glg] = D(Fla]). (1.27) 


Variations also commute with integration in a similar sense. 
If a path-dependent function f is stationary for a particular 
path q with respect to small changes in that path then it must be 
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stationary for a subset of those variations that result from adding 
small multiples of a particular function 7 to q. So the statement 
On flq] = 0 for arbitrary 7 implies the function f is stationary for 
small variations of the path around q. 


Exercise 1.7: Properties of 6 
Show that 6 has the properties 1.23-1.27. 


Exercise 1.8: Implementation of 6 


a. Suppose we have a procedure f that implements a path-dependent 
function: for path q and time t it has the value ((f q) t). The proce- 
dure delta computes the variation (6, f)[q](t) as the value of ((((delta 
eta) f) q) t). Complete the definition of delta: 


(define ((((delta eta) f) q) t) 


) 


b. Use your delta procedure to verify the properties of 6 listed in ex- 
ercise 1.7 for simple functions such as implemented by the procedure f: 


(define ((F q) t) 
((literal-function ’f) (q t))) 


This implements a simple path-dependent function that depends only 
on the coordinates of the path at each moment. 


Varying the action 
The action is the integral of the Lagrangian along a path: 


S{al(ts,t2) = f “Toti, (1.28) 


For a realizable path q the variation of the action with respect to 
any variation 7 that preserves the endpoints, n(t1) = n(t2) = 0, is 
Zero: 


bn Sla] (ti, t2) = 0. (1.29) 


The variation of the action is 
te 
bn 5[q](t1, t2) = dnh[q] where hja] = L o Tfq]. (1.30) 
tı 


This follows from the fact that variation commutes with integra- 
tion. 
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Using the fact that 


dnl [a] = (0,0, Dn), (1.31) 


which follows from equations (1.20) and (1.21), and using the chain 
rule for variations (1.26) we get? 


,S{al(t1, tz) = / "(DL oT a) 5,0 ld 


tı 


= f (Lorian + (BLOT) Dn). (1.32) 


tı 


Integrating the last term of equation (1.32) by parts gives 
by Sla] (tı, t2) = (82L o Tlg)nl 


+f {(@LoF[q) -D(L oT])}n. (1.33) 


ti 


For our variation 7 we have n(t1) = n(t2) = 0 so the first term 
vanishes. 
So the variation of the action is zero if and only if 


0= |  {(aLoP lal) — D(L o Tlg))} n: (1.34) 


ty 


The variation of the action is zero because, by assumption, q is a 
realizable path. Thus (1.34) must be true for any function 7 that 
is zero at the endpoints. 

We retain enough freedom in the choice of the variation so that 
this forces the factor in the integrand multiplying 7 to be zero at 
each point along the path. We argue by contradiction: Suppose 
this factor were nonzero at some particular time. Then it would 
have to be nonzero in at least one of its components. But if we 
choose our 7 to be a bump that is nonzero only in that component 
in a neighborhood of that time, and zero everywhere else, then the 


52A function of multiple arguments is considered a function of a tuple of its 
arguments. Thus, the derivative of a function of multiple arguments is a 
tuple of the partial derivatives of that function with respect to each of the 
arguments. So in the case of a Lagrangian L 


DL(t, q, v) = [Oo L(t, q,¥), Ltt, q,¥), O2 L(t, q, v)] $ 
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integral will be nonzero. So we may conclude that the factor in 


curly brackets is identically zero:5 


D (2L oT|q]) — (31 L o T[q]) = 0. (1.35) 


This is just what we set out to obtain, the Lagrange equations. 

A path satisfying Lagrange’s equations is one for which the 
action is stationary, and the fact that the action is stationary de- 
pends only on the values of L at each point of the path (and at 
each point on nearby paths), but not on the coordinate system we 
use to compute these values. So if the system’s path satisfies La- 
grange’s equations in some particular coordinate system, it must 
satisfy Lagrange’s equations in any coordinate system. Thus the 
equations of variational mechanics are derived the same way in 
any configuration space and any coordinate system. 


Harmonic oscillator 
For an example, consider the harmonic oscillator. A Lagrangian 
is 


L(t, x, v) = mv? — 5ka?. (1.36) 
Then 
L(t, x,v) = —kx and L(t, z, v) = mv. (1.37) 


The Lagrangian is applied to a tuple of the time, a coordinate, 
and a velocity. The symbols t, xz, and v are arbitrary; they are 
used to specify formal parameters of the Lagrangian. 

Now suppose we have a configuration path y, which gives the 
coordinate of the oscillator y(t) for each time t. The initial seg- 
ment of the corresponding local tuple at time t is 


Piue) = (t, y), Dy(t)) . (1.38) 
So 
O.L oT ly|(t) = —ky(t) and O2.LoT|y](t) = mDy(t), (1.39) 


5 . . . $ 
53To make this argument more precise requires careful analysis. 
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and 

D(AL o P[y))(t) = mD?y(t), (1.40) 
so the Lagrange equation is 

mD*y(t) + ky(t) = 0, (1.41) 


which is the equation of motion of the harmonic oscillator. 


Orbital motion 
As another example, consider the two-dimensional motion of a 
particle of mass m with gravitational potential energy —j/r, 


where r is the distance to the center of attraction. A Lagrangian 
+ 54 
is 


a 


where € and ņ are formal parameters for rectangular coordinates 
of the particle, and vg and vy are formal parameters for corre- 
sponding rectangular velocity components. Then®? 


O, L(t; E, N; ve, Un) = [O10 L(t; E, N; ve, Un), 81,1 L(t; E, N; VE, Uy) 
= =f —pn 
(€2 + 2)3/?? (€2 4 23/7} ee 


1 
L(t; E, 1; Ve, Un) = smug + vn) + (1.42) 


Similarly, 
OL (t; E, N; Vg, Un) = [mvg, mvn] . (1.44) 


(x,y), so that 


Now suppose we have a configuration pat = 
y(t)). The initial 


h 
the coordinate tuple at time t is g(t) = (a(t 
segment of the local tuple at time t is 


Tla) = (t; z(t), y(t); Dr(t), Dy(t)) . (1.45) 


q 
), 


54When we write a definition that names the components of the local tuple, we 
indicate that these are grouped into time, position, and velocity components 
by separating the groups with semicolons. 


55The derivative with respect to a tuple is a tuple of the partial derivatives 
with respect to each component of the tuple (see the appendix on notation). 
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So 
a LoTtel(t) = —px(t) —py(t) 
aks ((a(t))? + (YE) (a(t)? + (y(t))2)2? 
2L o V[q|(t) = [mDa(t), mDy(t)] (1.46) 
and 
D(L oT [q])(t) = [mD?x(t),mD?y(t)] . (1.47) 


The component Lagrange equations at time t are 


mD?x(t p(t) =0 
OT TO +60)” 
ADA py(t) = 
PO + i+ OO?” i 


Exercise 1.9: Lagrange’s equations 


Derive the Lagrange equations for the following systems, showing all of 
the intermediate steps as we did in the harmonic oscillator and orbital 
motion examples. 


a. A particle of mass m moves in a two-dimensional potential V (x, y) = 
(x? + y”)/2 + xy — y?/3, where x and y are rectangular coordinates of 
the particle. A Lagrangian for this system is L(t; x, y; vz, vy) = 4m(v2+ 


v2) —V(e,y). ° 


b. An ideal planar pendulum consists of a bob of mass m connected to 
a pivot by a massless rod of length / subject to uniform gravitational 
acceleration g. A Lagrangian for this system is L(t,0,0) = $ml?6? + 
mglcos@. The formal parameters of L are t, 0, and 6: 0 measures the 


angle of the pendulum rod to a plumb-line and Å is the angular velocity 
of the rod.°° 


c. A Lagrangian for a particle of mass m constrained to move on a 
sphere of radius R is L(t; 0,9; a, 3) = ¿mR? (a? + (sin 0)?). The angle 
0 is colatitude of the particle is and ¢ is the longitude; the rate of change 
of the colatitude is œ and the rate of change of the longitude is 8. 


56 The symbol Ê is just a mnemonic symbol; the dot over the @ is not intended 
to indicate differentiation. To define L we could have just as well have written: 
L(a,b,c) = im? + mglcosb. However, we use a dotted symbol to remind 


us that the argument matching a formal parameter, such as Å, is a rate of 
change of an angle, such as 0. 


34 Chapter 1 Lagrangian Mechanics 


Exercise 1.10: Higher derivative Lagrangians 


Derive Lagrange’s equations for Lagrangians that depend on the acceler- 
ations. In particular, show that the Lagrange equations for Lagrangians 
of the form L(t, q, ġ, ä) with ¢ terms are:57 


D? (zL o T|q]) — D(32L oT [q]) + AL oT [a] = 0. (1.49) 


In general, these equations, first derived by Poisson, will involve the 
fourth derivative of q. Note that the derivation is completely analogous 
to the derivation of the Lagrange equations without accelerations; it is 
just longer. What restrictions must we place on the variations so that 
the critical path satisfies a differential equation? 


1.5.2 Computing Lagrange’s Equations 


The procedure for computing Lagrange’s equations mirrors the 
functional expression (1.18), where the procedure Gamma imple- 
ments T:58 


(define ((Lagrange-equations Lagrangian) q) 
(- (D (compose ((partial 2) Lagrangian) (Gamma q))) 
(compose ((partial 1) Lagrangian) (Gamma q)))) 


The argument of Lagrange-equations is a procedure that com- 
putes a Lagrangian. It returns a procedure that when applied to 
a path q returns a procedure of one argument (time) that com- 
putes the left-hand side of the Lagrange equations (1.18). These 
residual values are zero if q is a path for which the Lagrangian 
action is stationary. 

Observe that the Lagrange-equations procedure, like the La- 
grange equations themselves, is valid for any generalized coordi- 
nate system. When we write programs to investigate particular 
systems, the procedures that implement the Lagrangian function 
and the path q will reflect the actual coordinates chosen to rep- 
resent the system, but we use the same Lagrange-equations pro- 
cedure in each case. This abstraction reflects the important fact 


57In traditional notation these equations read 


@ ðL dôöL OL 


og aog oe ® 


58The Lagrange-equations procedure uses the operations (partial 1) and 
(partial 2), which implement the partial derivative operators with respect 
to the second and third argument positions (those with indices 1 and 2). 
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that the method of derivation of Lagrange’s equations from a La- 
grangian is always the same; it is independent of the number of 
degrees of freedom, the topology of the configuration space, and 
the coordinate system used to describe points in the configuration 
space. 


The free particle 

Consider again the case of a free particle. The Lagrangian is 
implemented by the procedure L-free-particle. Rather than 
numerically integrating and minimizing the action, as we did in 
section 1.4, we can check Lagrange’s equations for an arbitrary 
straight-line path t +> (at + ao, bt + bo, ct + co) 


(define (test-path t) 
(up (+ (* ?a t) ’a0) 
(+ (* ?b t) ’b0) 

(+ (* ?c t) ?c0))) 


(print-expression 

(((Lagrange-equations (L-free-particle ’m)) 
test-path) 
’t)) 

(down 0 0 0) 


That the residuals are zero indicates that the test-path satisfies 
the Lagrange equations.5’ 

Instead of checking the equations for an individual path in 
three-dimensional space, we can also apply the Lagrange-equations 
procedure to an arbitrary function:60 


(show-expression 

(((Lagrange-equations (L-free-particle ’m)) 
(literal-function ’x)) 
’t)) 

(* (((expt D 2) x) t) m) 


59There is a Lagrange equation for every degree of freedom. The residuals of 
all the equations are zero if the path is realizable. The residuals are arranged 
in a down tuple because they result from derivatives of the Lagrangian with 
respect to argument slots that take up tuples. See the appendix on notation. 


6° Observe that the second derivative is indicated as the square of the derivative 
operator (expt D 2). Arithmetic operations in Scmutils extend over operators 
as well as functions. 
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mD*x (t) 


The result is an expression containing the arbitrary time t, and 
mass m, so it is zero precisely when D?x = 0, which is the expected 
equation for a free particle. 


The harmonic oscillator 

Consider the harmonic oscillator again, with Lagrangian (1.16). 
We know that the motion of a harmonic oscillator is a sinusoid 
with a given amplitude, frequency and phase: 


x(t) = acos(wt + y). (1.50) 


Suppose we have forgotten how the constants in the solution relate 
to the physical parameters of the oscillator. Let’s plug in the 
proposed solution and look at the residual: 


(define (proposed-solution t) 
(* ?a (cos (+ (* ’omega t) ’phi)))) 


(show-expression 
(((Lagrange-equations (L-harmonic ’m ’k)) 
proposed-solution) 


t)) 


cos (wt + yp) a (k — mw”) 


The residual here shows that for nonzero amplitude, the only so- 
lutions allowed are ones where (k — mw?) = 0, or w = \/k/m. 


Exercise 1.11: 


Compute Lagrange’s equations for the Lagrangians in exercise 1.9 using 
the Lagrange-equations procedure. Additionally, use the computer to 
perform each of the steps in the Lagrange-equations procedure and 
show the intermediate results. Relate these steps to the ones you showed 
in the hand derivation of exercise 1.9. 


Exercise 1.12: 


a. Write a procedure to compute the Lagrange equations for Lagrangians 
that depend upon acceleration, as in exercise 1.10. 
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b. Use your procedure to compute the Lagrange equations for the La- 
grangian 


L(t, x, v,a) = -imza — ska. 


Do you recognize the resulting equation of motion? 


c. For more fun write the general Lagrange equation procedure that 
takes a Lagrangian of any order, and the order, to produce the required 
equations of motion. 


1.6 How to Find Lagrangians 


Lagrange’s equations are a system of second-order differential 
equations. In order to use them to compute the evolution of a 
mechanical system we must find a suitable Lagrangian for the 
system. There is no general way to construct a Lagrangian for 
every system, but there is an important class of systems for which 
we can identify Lagrangians in a straightforward way in terms of 
kinetic and potential energy. The key idea is to construct a La- 
grangian L such that Lagrange’s equations are Newton’s equations 
F= mā. 

Suppose our system consists of N particles indexed by a, with 
mass Ma and vector position a(t). Suppose further that the 
forces acting on the particles can be written in terms of a gradient 
of a potential energy V, which is a function of the positions of 
the particles and possibly time, but which does not depend on the 
velocities. In other words, the force on particle a is Fẹ, = -Vz V, 
where Vz V is the gradient of V with respect to the position of 
the particle with index œ. We can write Newton’s equations as 


D(ma Dz.) (t) + Yz, V(t, o(t),...,#v_1(t)) = 0. (1.51) 


Vectors can be represented as tuples of components of the vec- 
tors on a rectangular basis. So 7ı(t) is represented as the tuple 
x,(t). Let V be the potential energy function expressed in terms 
of components: 


V(t; xo(t),--.,xw-1(t)) = V(t, Zo(t),.-.,v—1(2)). (1.52) 
Newton’s equations are 


D(Mma DXa)(t) +31 aV (t; Xo(t), -.-,Xa(t),...,%n-1(t)) = 0,(1.53) 
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where 01,4V is the partial derivative of V with respect to the xq(t) 
argument slot. 

To form the Lagrange equations we collect all the position 
components of all the particles into one tuple z(t), so x(t) = 


(xo(t),...,Xy_1(t)). The Lagrange equations for the coordinate 
path x are 
D(0.L oT|x]) — (O,L0T|z]) = 0. (1.54) 


Observe that Newton’s equations (1.51) are just the compo- 
nents of the Lagrange equations (1.54) if we choose L to have the 
properties 


2L 0 T[ax](t) = [moDxo(t),...,mn-1Dxn_1(t)] 
Lo Ta] (t) = [—O10V(t, x(t)), niay —O1,n-1V (t, x(t))] ; (1.55) 


where V(t, z(t)) = V(t;xo(t),...,xw-1(t)) and iaV (t, z(t)) is 
the tuple of the components of the derivative of V with respect 
to the coordinates of the particle with index a, evaluated at time 
t and coordinates x(t). These conditions are satisfied if for every 
aa and ba 


O2L(t; ap, . . .,aẸn—1; bo, ---, by—1) 
= (mobo, ..., Mmy-1Þby-1] (1.56) 


and 


O, L(t; ao,---,an—1; bo, .--, byN—1) 
= [-O oV (t, a), siig —O1,n-1V (Et, a)| ; (1.57) 


where a = (ag,...,ay_—1). We use the symbols a and b to empha- 
size that these are just formal parameters of the Lagrangian. One 
choice for L that has the required properties (1.56-1.57) is 


1 
L(t,,v) = 5 So mav? — V(t, 2), (1.58) 


where v2 is the sum of the squares of the components of va.f! 


51Remember that x and v are just formal parameters of the Lagrangian. This 
x is not the path x used earlier in the derivation, though it could be the value 
of that path at a particular time. 
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The first term is the kinetic energy, conventionally denoted T. 
So this choice for the Lagrangian is L(t, x, v) = T(t, x,v)—V(t, x), 
the difference of the kinetic and potential energy. We will often 
extend the arguments of the potential energy function to formally 
include the velocities so that we can write L = T — V.© 


Hamilton’s principle 

Given a system of point particles for which we can identify the 
force as the (negative) derivative of a potential energy V that is 
independent of velocity, we have shown that the system evolves 
along a path that satisfies Lagrange’s equations with L = T — V. 
Having identified a Lagrangian for this class of systems, we can 
restate the principle of stationary action in terms of energies. This 
statement is known as Hamilton’s Principle: A point-particle sys- 
tem for which the force is derived from a potential energy that 
is independent of velocity, evolves along a path q for which the 
action 


Sales | ° LoTid 


ti 


is stationary with respect to variations of the path q that leave 
the endpoints fixed, where L = T — V is the difference between 
kinetic and potential energy.® 


®2We can always give a function extra arguments that are not used so that it 
can be algebraically combined with other functions of the same shape. 


°3Hamilton formulated the fundamental variational principle for time- 
independent systems in 1834-1835. Jacobi gave this principle the name 
“Hamilton’s principle.” For systems subject to generic, nonstationary con- 
straints Hamilton’s principle was investigated in 1848 by Ostrogradsky. In 
the Russian literature Hamilton’s principle is often called the Hamilton- 
Ostrogradsky principle. 

William Rowan Hamilton (1805-1865) was a brilliant 19th-century mathe- 
matician. His early work on geometric optics (based on Fermat’s principle) 
was so impressive that he was elected to the post of Professor of Astronomy at 
Trinity College and Royal Astronomer of Ireland while he was still an under- 
graduate. He produced two monumental works of 19th-century mathematics. 
His discovery of quaternions revitalized abstract algebra and sparked the de- 
velopment of vector techniques in physics. His 1835 memoir “On a General 
Method in Dynamics” put variational mechanics on a firm footing, finally giv- 
ing substance to Maupertuis’s vaguely stated Principle of Least Action of 100 
years before. Hamilton also wrote poetry and carried on an extensive corre- 
spondence with Wordsworth, who advised him to put his energy into writing 
mathematics rather than poetry. 
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It might seem that we have reduced Lagrange’s equations to 
nothing more than F = ma, and indeed, the principle is motivated 
by comparing the two equations for this special class of systems. 
However, the Lagrangian formulation of the equations of motion 
has an important advantage over F = mg. Our derivation used 
the rectangular components Xa of the positions of the constituent 
particles for the generalized coordinates, but if the system’s path 
satisfies Lagrange’s equations in some particular coordinate sys- 
tem, it must satisfy the equations in any coordinate system. Thus 
we see that L = T — V is suitable as a Lagrangian, with any set of 
generalized coordinates. The equations of variational mechanics 
are derived the same way in any configuration space and any co- 
ordinate system. In contrast, the Newtonian formulation is based 
on elementary geometry: in order for D?Z(t) to be meaningful 
as an acceleration, Z(t) must be a vector in physical space. La- 
grange’s equations have no such restriction on the meaning of the 
coordinate q. The generalized coordinates can be any parameters 
that conveniently describe the configurations of the system. 


Constant acceleration 

Consider a particle of mass m in a uniform gravitational field with 
acceleration g. The potential energy is mgh where h is the height 
of the particle. The kinetic energy is just imo’. A Lagrangian 
for the system is the difference of the kinetic and potential en- 
ergies. In rectangular coordinates, with y measuring the vertical 
position and xz measuring the horizontal position, the Lagrangian 
is L(t; £, Y; Vr, Vy) = im (v2 + v2) —mgy. We have 


(define ((L-uniform-acceleration m g) local) 
(let ((q (coordinate local)) 
(v (velocity local))) 
(let ((y (ref q 1))) 
(- (* 1/2 m (square v)) (* m g y))))) 


In addition to the formulation of the fundamental variational principle, 
Hamilton also stressed the analogy between geometric optics and mechanics, 
and stressed the importance of the momentum variables (which were earlier 
introduced by Lagrange and Cauchy), leading to the “canonical” form of me- 
chanics, which we discuss in chapter 3. 
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(show-expression 
(((Lagrange-equations 
(L-uniform-acceleration ’m ’g)) 
(up (literal-function ’x) 
(literal-function ’y))) 
’t)) 


mD?z (t) | 


gm + mD?y (t) 


This equation describes unaccelerated motion in the horizontal 
direction (mD?x(t) = 0) and constant acceleration in the vertical 
direction (mD?y(t) = —gm). 


Central force field 

Consider planar motion of a particle of mass m in a central force 
field, with an arbitrary potential energy U(r) depending only upon 
the distance r to the center of attraction. We will derive the La- 
grange equations for this system in both rectangular coordinates 
and polar coordinates. 

In rectangular coordinates (x,y), with origin at the center of 
attraction, the potential energy is V(t; x, y) = U(y x? + y?). The 
kinetic energy is T(t; £, Y; vz, vy) = 5m(v2 + vz). A Lagrangian 
for the system is L = T — V: 


L(t; £, Y; Ue, vy) = 9m(vz + vy) — U( y2? + y?). (1.59) 
As a procedure: 


(define ((L-central-rectangular m U) local) 
(let ((q (coordinate local)) 
(v (velocity local))) 
(- (* 1/2 m (square v)) 
(U (sqrt (square q)))))) 


The Lagrange equations are 


(show-expression 
(((Lagrange-equations 
(L-central-rectangular ’m (literal-function ’U))) 
(up (literal-function ’x) 
(literal-function ’y))) 
’t)) 
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fms (t) 4 


pare (t) 4 


mD?x(t) = Se (1.60) 
mD?y(t) = -20 DUC), (1.61) 


where r(t) = ,/(a(t))? + (y(t))?. We can interpret these as fol- 
lows. The particle is subject to a radially directed force with 
magnitude —DU(r). Newton’s equations equate the force with 
the product of the mass and the acceleration. The two Lagrange 
equations are just the rectangular components of Newton’s equa- 
tions. 

We can describe the same system in polar coordinates. The 
relationship between rectangular coordinates (x,y) and polar co- 
ordinates (r, œ) is: 


£ = r cos Y 


y = rsin 9. (1.62) 


The relationship of the generalized velocities is derived from the 
coordinate transformation. Consider a configuration path that is 
represented in both rectangular and polar coordinates. Let % and 
y be components of the rectangular coordinate path, and let 7 and 
Ø be components of the corresponding polar coordinate path. The 
rectangular components at time t are (Z(t), y(t)), and the polar 
coordinates at time t are (T(t), Ø(t)). They are related by (1.62): 


x(t) = T(t) cos Y(t) 
y(t) = T(t) sin g(t). (1.63) 
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The rectangular velocity at time t is (Dz(t), Dy(t)). Differentiat- 
ing (1.63) gives the relationship among the velocities 


Dz(t) = Dr(t) cos g(t) — T(t) DG(t) sin Y(t) 
Dy(t) = Dr(t) sin p(t) + T(t) DG(t) sin P(t). (1.64) 


These relations are valid for any configuration path at any mo- 
ment, so we can abstract them to relations among coordinate 
representations of an arbitrary velocity. Let vz, and vy be the 
rectangular components of the velocity; and 7 and ¢ be the rate 
of change of r and y. Then 


Vz = cos 9 — rosin p 
vy = sin Y + rọ cos y. (1.65) 


The kinetic energy is $m(v? + v2): 


T(t7, 937, p) = m(t? + 129”), (1.66) 
and the Lagrangian is 

L(t7, 93%, p) = m(t? + r°?) — U(r). (1.67) 
We express this Lagrangian as follows: 


(define ((L-central-polar m U) local) 
(let ((q (coordinate local)) 
(qdot (velocity local))) 
(let ((r (ref q 0)) (phi (ref q 1)) 
(rdot (ref qdot 0)) (phidot (ref qdot 1))) 
(- (* 1/2 m 
(+ (square rdot) 
(square (* r phidot))) ) 
(U r))))) 


Lagrange’s equations are: 


(show-expression 
(((Lagrange-equations 
(L-central-polar ’m (literal-function ’U))) 
(up (literal-function ’r) 
(literal-function ’phi))) 
t)) 
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mD?r (t) — mr (t) (Dy (t))? + DU (r (t)) 


2mDr (t) r (t) Dy (t) + mD?y (t) (r (4)? 


We can interpret the first equation as expressing that the product 
of the mass and the radial acceleration is the sum of the force due 
to the potential and the centrifugal force. The second equation 
can be interpreted as saying that the derivative of the angular 
momentum mr? Dy is zero; so angular momentum is conserved.®* 

Note that we used the same Lagrange-equations procedure 
for the derivation in both coordinate systems. Coordinate repre- 
sentations of the Lagrangian are different for different coordinate 
systems, and the Lagrange equations in different coordinate sys- 
tems look different. Yet, the same method is used to derive the 
Lagrange equations in any coordinate system. 


Exercise 1.13: 


Check that the Lagrange equations for central force motion in polar 
coordinates and the Lagrange equations in rectangular coordinates are 
equivalent. Determine the relationship among the second derivatives 
by substituting paths into the transformation equations and computing 
derivatives, then substitute these relations into the equations of motion. 


1.6.1 Coordinate Transformations 


The motion of a system is independent of the coordinates we use to 
describe it. This coordinate-free nature of the motion is apparent 
in the action principle. The action depends only on the value of the 
Lagrangian along the path and not on the particular coordinates 
used in the representation of the Lagrangian. We can use this 
property to find a Lagrangian in one coordinate system in terms 
of a Lagrangian in another coordinate system. 

Suppose we have a mechanical system whose motion is de- 
scribed by a Lagrangian L that depends on time, coordinates, 
and velocities. And suppose we have a coordinate transformation 
F such that x = F(t, x’). The Lagrangian L is expressed in terms 
of the unprimed coordinates. We want to find a Lagrangian L’ ex- 
pressed in the primed coordinates that describes the same system. 
One way to do this is to require that the value of the Lagrangian 
along any configuration path be independent of the coordinate 


64We will talk much more about angular momentum later. 
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system. If q is a path in the unprimed coordinates and q’ is the 
corresponding path in primed coordinates, then the Lagrangians 
must satisfy: 


L' oT|q] = L oTfq]. (1.68) 


We have seen that the transformation from rectangular to polar 
coordinates implies that the generalized velocities transform in a 
certain way. The velocity transformation can be deduced from the 
requirement that a path in polar coordinates and a corresponding 
path in rectangular coordinates are consistent with the coordinate 
transformation. In general, the requirement that paths in two 
different coordinate systems are consistent with the coordinate 
transformation can be used to deduce how all of the components 
of the local tuple transform. Given a coordinate transformation 
F, let C be the corresponding function that maps local tuples in 
the primed coordinate system to corresponding local tuples in the 
unprimed coordinate system 


C oT [q'] = Tq]. (1.69) 


We will deduce the general form of C below. 
Given such local tuple transformation C, a Lagrangian L’ that 
satisfies equation (1.68) is 


L=Lo. (1.70) 
We can see this by substituting L’ into equation (1.68) 
L' oT|q] = Lo C oT|qd'] = L oTfq]. (1.71) 


To deduce the local-tuple transformation C given a coordinate 
transformation F, we deduce how each component of the local tu- 
ple transforms. Of course, the coordinate transformation specifies 
how the coordinate component of the local tuple transforms. The 
generalized velocity component of the local-tuple transformation 
can be deduced as follows. Let q and q’ be the same configura- 
tion path expressed in the two coordinate systems. Substituting 
these paths into the coordinate transformation and computing the 
derivative we find 


Dq(t) = F(t, q(t) + F(t, q(t) Dq (6). (1.72) 
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Through any point there is always a path of any given velocity, 
so we may generalize, and conclude that along corresponding co- 
ordinate paths the generalized velocities satisfy 


v = OF (t, 2’) + OF (t,2’)v’. (1.73) 


If needed, rules for higher derivative components of the local tuple 
can be determined in a similar fashion. The local-tuple transfor- 
mation that takes a local tuple in the primed system to a local 
tuple in the unprimed system is constructed from the component 
transformations: 


Mor traa Oe ey oa) 
= (t, F(t,2'), F(t, £) + OF (t,2’)v’, ...). (1.74) 


So if we take the Lagrangian L’ to be 
L=LoC (1.75) 


then the action has a value that is independent of the coordinate 
system used to compute it. The configuration path of stationary 
action does not depend on which coordinate system is used to 
describe the path. The Lagrange equations derived from these 
Lagrangians will in general look very different from one another, 
but they must be equivalent. 


Exercise 1.14: 


Show by direct calculation that the Lagrange equations for L’ are satis- 
fied if the Lagrange equations for L are satisfied. 


Given a coordinate transformation F, we can use equation (1.74) 
to find the function C, which transforms local tuples. The proce- 
dure F->C implements this’ 


(define ((F->C F) local) 
(->local (time local) 
(F local) 
(+ (((partial 0) F) local) 
(* (((partial 1) F) local) 
(velocity local))))) 


65 As described in footnote 28 the procedure ->local constructs a local tuple 
from an initial segment of time, coordinates, and velocities. 
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As an illustration, consider the transformation from polar to 
rectangular coordinates: x = rcosy and y = rsiny, with the 
following implementation: 


(define (p->r local) 
(let ((polar-tuple (coordinate local))) 
(let ((r (ref polar-tuple 0)) 
(phi (ref polar-tuple 1))) 
(let ((x (* r (cos phi))) 
(y (* r (sin phi)))) 
(up x y))))) 


In terms of the polar coordinates and the rates of change of the po- 
lar coordinates, the rates of change of the rectangular components 
are: 


(show-expression 
(velocity 
((F->C p->r) 
(->local ’t (up ’r ’phi) (up ’rdot ’phidot))))) 


( —ġr sin (y) + 7 cos (y) ) 


yr cos (p) + č sin (9) 


We can use F->C to find the Lagrangian for central force motion in 
polar coordinates from the Lagrangian in rectangular components, 
using equation (1.70), 


(define (L-central-polar m U) 
(compose (L-central-rectangular m U) (F->C p->r))) 


(show-expression 
((L-central-polar ’m (literal-function ’U)) 
(->local ’t (up ’r ’phi) (up ’rdot ’phidot)))) 


1 1 
mor + gmi —U (r) 


The result is the same as Lagrangian (1.67). 


Exercise 1.15: Central force motion 


Find Lagrangians for central force motion in three dimensions in rect- 
angular coordinates and in spherical coordinates. First, find the La- 
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grangians analytically, then check the results with the computer by gen- 
eralizing the programs that we have presented. 


1.6.2 Systems with Rigid Constraints 


We have found that L = T — V is a suitable Lagrangian for a 
system of point particles subject to forces derived from a potential. 
Extended bodies can sometimes be conveniently idealized as a 
system of point particles connected by rigid constraints. We will 
find that L = T — V, expressed in irredundant coordinates, is a 
suitable Lagrangian for modeling systems of point particles with 
rigid constraints. We will first illustrate the method and then 
provide a justification. 


Lagrangians for rigidly constrained systems 

The system is presumed to be made of N point masses, indexed by 
a, in ordinary three-dimensional space. The first step is to choose 
a convenient set of irredundant generalized coordinates q and re- 
describe the system in terms of these. In terms of the generalized 
coordinates the rectangular coordinates of particle a is: 


Xq = falt, q). (1.76) 


For irredundant coordinates q all the coordinate constraints are 
built into the functions fa. We deduce the relationship of the 
generalized velocities v to the velocities of the constituent particles 
Va by inserting path functions into equation (1.76), differentiating, 
and abstracting to arbitrary velocities. We find 


Va = Oofalt, q) + ôi falt, q)v. (1.77) 


We use equations (1.76) and (1.77) to express the kinetic energy 
in terms of the generalized coordinates and velocities. Let T be 
the kinetic energy as a function of the rectangular coordinates and 
velocities: 


1 
T(t; X0, ess XN—1; VO; - - -, VN—1) = $ FMV (1.78) 
a 


where v2 is the squared magnitude of va. As a function of the 
generalized coordinate tuple q and the generalized velocity tuple 


66 See section 1.6.1. 
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A 


ys(t) 


Figure 1.2 The pendulum is driven by vertical motion of the pivot. 
The pivot slides on the y-axis. Although the bob is drawn as a blob 
it is modeled as a point mass. The bob is acted on by the uniform 
acceleration g of gravity in the negative ĝ direction. 


v the kinetic energy is 


T(t,q,v) = whee f(t, q), Oof (t,q) + OF (t, qv) 
=>. I ma(@falt, q) + ô falt, gv)’. (1.79) 


Similarly, we use equation (1.76) to reexpress the potential en- 
ergy in terms of the generalized coordinates. Let V(t, x) be the 
potential energy at time t in the configuration specified by the 
tuple of rectangular coordinates x. Expressed in generalized co- 
ordinates the potential energy is 


V(t,q, v) = V(t, f(t, q)). (1.80) 


We take the Lagrangian to be the difference of the kinetic energy 
and the potential energy: L = T — V. 


A pendulum driven at the pivot 

Consider a pendulum (see figure 1.2) of length l and mass m, 
modeled as a point mass, supported by a pivot that is driven in 
the vertical direction by a given function of time ys. 
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The dimension of the configuration space for this system is one; 
we choose @, shown in figure 1.2, as the generalized coordinate. 
The position of the bob is given, in rectangular coordinates, by 


x= sine and y= ys(t) —Icos0. (1.81) 
The velocities are 
Vs =l0cos@ and vy = Dy,(t) + ldsin 8, (1.82) 


obtained by differentiating along a path and abstracting to veloc- 
ities at the moment. 

The kinetic energy is T(t; £, Y; Vz, Vy) = imo? +02). Expressed 
in generalized coordinates the kinetic energy is 


T(t,0,6) = Em (26? + (Dys(t))? + 21Dys (t) sin 8) . (1.83) 


The potential energy is V(t; x,y) = mgy. Expressed in gener- 
alized coordinates the potential energy is 


V(t, 0,9) = gm (ys(t) — I cos 6). (1.84) 


A Lagrangian is bD=T—V. 
The Lagrangian is expressed as 


(define ((T-pend m1 g ys) local) 
(let ((t (time local)) 
(theta (coordinate local)) 
(thetadot (velocity local))) 
(let ((vys (D ys))) 
(* 1/2 m 
(+ (square (* 1 thetadot)) 
(square (vys t)) 
(* 2 1 (vys t) thetadot (sin theta))))))) 


(define ((V-pend m 1 g ys) local) 
(let ((t (time local)) 
(theta (coordinate local))) 
(* m g (- (ys t) (* 1 (cos theta)))))) 


(define L-pend (- T-pend V-pend)) 
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Lagrange’s equation for this system is” 


(show-expression 
(((Lagrange-equations 
(L-pend ’m ’1 ’g (literal-function ’y_s))) 
(literal-function ’theta) ) 
’t)) 


D?90 (t) m + D7y, (t) sin (8 (t)) lm + sin (0 (t)) glm 


Exercise 1.16: 
Derive the Lagrangians in exercise 1.9. 


Exercise 1.17: Bead on a helical wire 


A bead of mass m is constrained to move on a frictionless helical wire. 
The helix is oriented so that its axis is horizontal. The diameter of the 
helix is d and its pitch (turns per unit length) is h. The system is in 
a uniform gravitational field with vertical acceleration g. Formulate a 
Lagrangian that describes the system and find the Lagrange equations 
of motion. 


Exercise 1.18: Bead on a triaxial surface 


A bead of mass m moves without friction on a triaxial ellipsoidal surface. 
In rectangular coordinates the surface satisfies 


r? y? z2 


for some constants a, b, and c. Identify suitable generalized coordinates, 
formulate a Lagrangian, and find Lagrange’s equations. 


Exercise 1.19: A two-bar linkage 


The two-bar linkage shown in figure 1.3 is constrained to move in the 
plane. It is composed of three small massive bodies interconnected by 
two massless rigid rods in a uniform gravitational field with vertical 
acceleration g. The rods are pinned to the central body by a hinge that 
allows the linkage to fold. The system is arranged so that the hinge is 
completely free: the members can go through all configurations without 


87 We hope you appreciate the TFXmagic here. A symbol with a underline char- 
acter is converted by show-expression to a subscript. Symbols with carets, 
the names of Greek letters, and terminating in the characters ”dot” are simi- 
larly mistreated. 
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Figure 1.3 A two-bar linkage is modeled by three point masses con- 
nected by rigid massless struts. This linkage is subject to a uniform 
vertical gravitational acceleration. 


Figure 1.4 This pendulum is pivoted on a point particle of mass mı 
that is allowed to slide on a horizontal rail. The pendulum bob is a point 
particle of mass mə that is acted on by the vertical force of gravity. 


collision. Formulate a Lagrangian that describes the system and find 
the Lagrange equations of motion. Use the computer to do this, because 
the equations are rather big. 


Exercise 1.20: Sliding pendulum 


Consider a pendulum of length l attached to a support that is free to 
move horizontally, shown in figure 1.4. Let the mass of the support be 
mı and the mass of the pendulum be mg. Formulate a Lagrangian and 
derive Lagrange’s equations for this system. 


Why it works 
In this section we show that L = T — V is in fact a suitable 
Lagrangian for rigidly constrained systems. We do this by requir- 
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ing that the Lagrange equations be equivalent to the Newtonian 
vectorial dynamics with vector constraint forces.® 

We consider a system of particles. The particle with index a has 
mass Ma and position Z(t) at time t. There may be a very large 
number of these particles, or just a few. Some of the positions 
may also be specified functions of time, such as the position of the 
pivot of a driven pendulum. There are rigid position constraints 
among some of the particles; we assume all of these constraints 
are of the form 


(fa(t) — Fa(t)) - (Talt) — Zelt) = lap (1.86) 


that is, the distance between particles a and £ is lag- 

The Newtonian equation of motion for particle œ says that the 
mass times the acceleration of particle œ is equal to the sum of the 
potential forces and the constraint forces. The potential forces are 
derived as the negative gradient of the potential energy, and may 
depend on the positions of the other particles and the time. The 
constraint forces Fy are the vector constraint forces associated 
with the rigid constraint between particle œ and particle 8. So 


D(Mma D&a)(t) 
= -Vz V(t, Fo(t),.-.,fv-i(t))+ XO Falt), (1.87) 
{6|B-a} 
where in the summation (§ ranges over only those particle indices 
for which there are rigid constraints with the particle indexed by 


a; we use the notation 8 © a for the relation that there is a rigid 
constraint between the indicated particles. 


68We will simply accept the Newtonian procedure for systems with rigid con- 
straints and find Lagrangians that are equivalent. Of course, actual bodies are 
never truly rigid, so we may wonder what detailed approximations have to be 
made to treat them as truly rigid. For instance, a more satisfying approach 
would be to replace the rigid distance constraints by very stiff springs. We 
could then immediately write the Lagrangian as L = T — V, and we should 
be able to derive the Newtonian procedure for systems with rigid constraints 
as an approximation. However, this is too complicated to do at this stage, so 
we accept the Newtonian idealization. 
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The force of constraint is directed along the line between the 
particles, so we may write 


Folt) = Fyp(t) = Fal!) 
lag 


(1.88) 
where Fag(t) is the scalar magnitude of the tension in the con- 
straint at time t. Note that Fag = — Fha. In general, the scalar 
constraint forces change as the system evolves. 

Formally, we can reproduce Newton’s equations with the La- 
grangian®? 


L(t; x, F; ġ, F) = Di imak — V(t, £) 
a 
Fag 


- E E [kexa] 89) 
lobla<b, amp} P 


where the constraint forces are being treated as additional gener- 
alized coordinates. Here x is a structure composed of all of the 
rectangular components Xa of all of the Za, t is a structure com- 
posed of all the rectangular components Xa of all of the velocity 
vectors Ug, and F is a structure composed of all of the Fag. The 
velocity of F does not appear in the Lagrangian, and F itself only 
appears linearly. So the Lagrange equations associated with F are 


(xa(t) — Xa(t))” — Bg = (1.90) 


but this is just a restatement of the constraints. The Lagrange 
equations for the particle coordinates are Newton’s equations (1.87) 


D(mDxa)(t) = —A1,aV (t, x(t)) 


xg(t) — XQ(t 
E moeg rae a (1.91) 


°°This Lagrangian is purely formal and does not represent a model of the 
constraint forces. In particular, note that the constraint terms do not look 
like a potential of constraint with a minimum when the constraint is exactly 
satisfied. Rather, the constraint terms in the Lagrangian are zero when the 
constraint is satisfied, and can be either positive or negative depending on 
whether the distance between the particles is larger or smaller than the con- 
straint distance. 
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Now that we have a suitable Lagrangian, we can use the fact 
that Lagrangians can be reexpressed in any generalized coordi- 
nates to find a simpler Lagrangian. The strategy is to choose 
a new set of coordinates for which many of the coordinates are 
constants and the remaining coordinates are irredundant. 

Let q be a tuple of generalized coordinates that specify the de- 
grees of freedom of the system without redundancy. Let c be a 
tuple of other generalized coordinates that specify the distances 
between particles for which constraints are specified. The c co- 
ordinates will have constant values. The combination of q and c 
replace the redundant rectangular coordinates x.° In addition, 
we still have the F coordinates, which are the scalar constraint 
forces. Our new coordinates are the components of q, c, and F. 

There exist functions fa that give the rectangular coordinates 
of the constituent particles in terms of q and c: 


Xa = falt,q, ©). (1.92) 


To reexpress the Lagrangian in terms of q, c, and F we need to 
find va in terms of the generalized velocities ġ and ¢: we do this 
by differentiating fẹ along a path and abstracting to arbitrary 
velocities (see section 1.6.1): 


Va = o falt, q, €) + 81 falt, q, €) 4 + 32 fa(t, q, €) é. (1.93) 
Substituting these into Lagrangian (1.89), and using 
Cag = Aio (1.94) 
we find 
L'(t;q,c, F 34, ¢, F) 
= So ima (falt, q, €) + ôi falt, q, €) å + Oefalt,q,e) È? 
a 


Fag 
lag 


— V(t, f(t,g,c)) — [ee = 28 . (1.95) 


2 
{a,Bla<B,asB} 


Typically the number of components of x is equal to the sum of the number 
of components of q and c; adding a strut removes a degree of freedom and 
adds a distance constraint. However, there are singular cases for which the 
addition of single strut can remove more than a single degree of freedom. We 
do not consider the singular cases here. 
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The Lagrange equations are derived by the usual procedure. 
Rather than write out all the gory details, let’s think about how 
it will go. 

The Lagrange equations associated with F just restate the con- 
straints: 


0= c(t) -— Be (1.96) 


and consequently we know that along a solution path c(t) = l, 
and Dc(t) = D?c(t) = 0. We can use this result to simplify the 
Lagrange equations associated with q and c. 

The Lagrange equations associated with q are the same as if 
they were derived from the Lagrangian”! 


L"(t,q,4) = y ima (Oo falt, q, l) le O71 falt, ql) 4)? 


— V(t, f(t, D), (1.97) 


but this is exactly T — V where T and V are computed from the 
generalized coordinates q, with fixed constraints. Notice that the 
constraint forces do not appear in the Lagrange equations for q 
because in the Lagrange equations they are multiplied by a term 
that is identically zero on the solution paths. So the Lagrange 
equations for T — V with irredundant generalized coordinates q 
and fixed constraints are equivalent to Newton’s equations with 
vector constraint forces. 

The Lagrange equations for c can be used to find the constraint 
forces. The Lagrange equations are a big mess so we will not show 
them explicitly, but in general they are equations in D?c, Dc, and 
c that will depend upon q, Dq, and F. The dependence on F is 
linear, so we can solve for F in terms of the solution path q and 
Dq, with c = l and De = D?c = 0. 

If we are not interested in the constraint forces, we can abandon 
the full Lagrangian (1.95) in favor of Lagrangian (1.97), which is 


Ti Consider a function g of, say, three arguments, and let go be a function of two 
arguments satisfying go(x,y) = g(x, y,0). Then (ogo)(£, y) = (00g) (2, y, 0). 
The substitution of a value in an argument commutes with the taking of 
the partial derivative with respect to a different argument. In deriving the 
Lagrange equations for q we can set c = l and ¢ = 0 in the Lagrangian, but we 
cannot do this in deriving the Lagrange equations associated with c, because 
we have to take derivatives with respect to those arguments. 
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equivalent as far as the evolution of the generalized coordinates q 
is concerned. 

The same derivation goes through even if the lengths lag speci- 
fied in the interparticle distance constraints are a function of time. 
It can also be generalized to allow distance constraints to be time- 
dependent positions, by making some of the positions of particles 
Tg specified functions of time. 


More generally 
Consider a constraint of the form 


p(t, x(t)) =0, (1.98) 


where z(t) is the structure of all the rectangular components x,(t) 
at time t. In section 1.10 we will show, using the variational 
principle, that an appropriate Lagrangian for this system is 


L(t; x, A; t, À) = XL mox?, — V (t, £) + Av(t, £), (1.99) 


where À is an additional coordinate and À is the corresponding 
generalized velocity. The Lagrange equations associated with A 
are just a restatement of the constraints: y(t,z(t)) = 0. The 
Lagrange equations for the particle coordinates are: 


D(maDxa)(t) = -31 aV (t, 2(t)) + AHArav(t,2(t)). (1-100) 


Such a constraint can also be modeled by including appropriate 
constraint forces in Newton’s equations: 


D(maD#q)(t) = —Vz, V(t; Zo(t) -..Ew—i(t)) + X- Fat). (1.101) 


For equations (1.100) to be the same as equations (1.101) we must 
identify \(t)O1 «p(t, x(t)) with the forces of constraint on particle 
a. Notice that these forces of constraint are proportional to the 
normal to the constraint surface at each instant and thus do no 
work for motions that obey the constraint. 

Lagrangian (1.89), which we developed above to include New- 
tonian forces of constraint for position constraints, is exactly of 
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1,41 


X0, Yo 


Figure 1.5 A rigid rod of length l constrains two massive particles in 
a plane. 


this form. We can identify 


Ove) = E O [s(t — xa)? - Be] 
{a,fla<Baxpy |P 
(1.102) 
The forces of constraint satisfy 
AWAaeeeO)= Sl Fey l= eo) (1.103) 


{Bla} lag 

Accepting Lagrangian (1.99) as describing systems with con- 
straints of the form (1.98), we can make a coordinate transforma- 
tion from the redundant coordinates x to irredundant generalized 
coordinates q and constraint coordinates c = (t,x), as above. 
The coordinate A will not appear in the Lagrange equations for 
q because on solution paths they will be multiplied by a factor 
that is identically zero. If we are interested only in the evolution 
of the generalized coordinates we can assume the constraints are 
identically satisfied and take the Lagrangian to be the difference 
of the kinetic and potential energies expressed in terms of the 
generalized coordinates. 


Exercise 1.21: The dumbbell 


In this exercise we will recapitulate the derivation of the Lagrangian for 
constrained systems for a particular simple system. 

Consider two massive particles in the plane constrained by a massless 
rigid rod to remain a distance l apart, as in figure 1.5. There are appar- 
ently four degrees of freedom for two massive particles in the plane, but 
the rigid rod reduces this number to three. 
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We can uniquely specify the configuration with the redundant coor- 
dinates of the particles, say xo(t), yo(t) and x1(t), yi(t). The constraint 
(x1(t) — xo(t))? + (yi (t) — yo(t))? = 1? eliminates one degree of freedom. 


a. Write Newton’s equations for the balance of forces for the four rect- 
angular coordinates of the two particles, given that the scalar tension in 
the rod is F. 


b. Write the formal Lagrangian 


L(t; £o, Yo, 1, Y1, F; £0, Yo, 41,91, F) 


such that Lagrange’s equations will yield the Newton’s equations that 
you derived in part a. 


c. Make a change of coordinates to a coordinate system with center of 
mass coordinates Zom, You, angle 0, distance between the particles c, and 
tension force F. Write the Lagrangian in these coordinates, and write 
the Lagrange equations. 


d. You may deduce from one of these equations that c(t) = l. From 
this fact we get that Dc = 0 and D?c = 0. Substitute these into the 
Lagrange equations you just computed to get equation of motion for 
Lom; Yom, Ô. 


e. Make a Lagrangian (= T — V) for the system described with the irre- 
dundant generalized coordinates zom, Yom, 0 and compute the Lagrange 
equations from this Lagrangian. They should be the same equations as 
you derived for the same coordinates from part d. 


Exercise 1.22: Driven pendulum 


Show that the Lagrangian (1.89) can be used to describe the driven 
pendulum, where the position of the pivot is a specified function of 
time: Derive the equations of motion using the Newtonian constraint 
force prescription, and show that they are the same as the Lagrange 
equations. Be sure to examine the equations for the constraint forces as 
well as the position of the pendulum bob. 


Exercise 1.23: Fill in the details 


Show that the Lagrange equations for Lagrangian (1.97) are the same 
as the Lagrange equations for Lagrangian (1.95) with the substitution 
c(t) =1, Delt) = D?c(t) = 0. 


Exercise 1.24: Constraint forces 


Find the tension in an undriven planar pendulum. 


60 Chapter 1 Lagrangian Mechanics 


1.6.3 Constraints as Coordinate Transformations 


The derivation of a Lagrangian for a constrained system involves 
steps that are analogous to the steps in the derivation of a coor- 
dinate transformation. 

For a constrained system one specifies the rectangular coordi- 
nates of the constituent particles in terms of generalized coordi- 
nates that incorporate the constraints. We then determine the 
rectangular velocities of the constituent particles as functions the 
generalized coordinates and the generalized velocities. The La- 
grangian that we know how to express in rectangular coordinates 
and velocities of the constituent particles can then be reexpressed 
in terms of the generalized coordinates and velocities. 

To carry out a coordinate transformation one specifies how the 
configuration of a system expressed in one set of generalized coor- 
dinates can be reexpressed in terms of another set of generalized 
coordinates. We then determine the transformation of general- 
ized velocities implied by the transformation of generalized coor- 
dinates. A Lagrangian that is expressed in terms of one of the 
sets of generalized coordinates can then be reexpressed in terms 
of the other set of generalized coordinates. 

These are really two applications of the same process, so we 
can make Lagrangians for constrained systems by composing a 
Lagrangian for unconstrained particles with a coordinate trans- 
formation that incorporates the constraint. Our deduction that 
L=T-—V isa suitable Lagrangian for a constrained systems was 
in fact based on a coordinate transformation from a set of coor- 
dinates subject to constraints to a set of irredundant coordinates 
plus constraint coordinates that are constant. 

Let xq be the tuple of rectangular components of the con- 
stituent particle with index a, and vq be its velocity. The La- 
grangian 


L(t; Xo,- --, XN—1; Vo,---, VN-1) 
= Ns imav? — V(t; X0,- --,XN—1; V0,---,VN-1) (1.104) 
Q 


is the difference of kinetic and potential energies of the constituent 
particles. This is a suitable Lagrangian for a set of unconstrained 
free particles with potential energy V. 

Let q be a tuple of irredundant generalized coordinates, and v 
be the corresponding generalized velocity tuple. The coordinates 
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q are related to Xq, the coordinates of the constituent particles, by 
Xa = fa(t,q), as before. The constraints among the constituent 
particles are taken into account in the definition of the fe. Here 
we view this as a coordinate transformation. What is unusual 
about this as a coordinate transformation is that the dimension 
of x is not the same as the dimension of g. From this coordinate 
transformation we can find the local-tuple transformation function 
(see section 1.6.1) 


(t; Xo, -++,%XN-1;V0;--- ,VN-1) a C(t, q, v). (1.105) 


A Lagrangian for the constrained system can be obtained from 
the Lagrangian for the unconstrained system by composing it with 
the local-tuple transformation function from constrained coordi- 
nates to unconstrained coordinates: 


L=L;0C. (1.106) 


The constraints enter only in the transformation. 

To illustrate this we will find a Lagrangian for the driven pen- 
dulum introduced in section 1.6.2. The T—V Lagrangian for a free 
particle of mass m in a vertical plane subject to a gravitational 
potential with acceleration g is 


Ly(t;2,y; Vx, vy) = 3m(vz + vy) — mgy, (1.107) 


where y measures the vertical height of the point mass. As a 
program 


(define ((Lf m g) local) 
(let ((q (coordinate local)) 
(v (velocity local))) 
(let (Cy (ref q 1))) 
(- (* 1/2 m (square v)) (* m g y))))) 


The coordinate transformation from generalized coordinate 0 to 
rectangular coordinates is x = lsin 0, y = y(t) —lcos 6, where | is 
the length of the pendulum and y, gives the height of the support 
as a function of time. It is interesting that the drive enters only 
through the specification of the constraints. As a program 
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(define ((dp-coordinates 1 y_s) local) 
(let ((t (time local)) 
(theta (coordinate local))) 
(let ((x (* 1 (sin theta))) 
(y (- (y-s t) (* 1 (cos theta))))) 
(up x y)))) 


Using F->C we can deduce the local-tuple transformation and de- 
fine the Lagrangian for the driven pendulum by composition: 


(define (L-pend m 1 g y-s) 
(compose (Lf m g) 
(F->C (dp-coordinates 1 y_s)))) 


The Lagrangian is 
(show-expression 


((L-pend ’m ’1 ’g (literal-function ’y_s)) 
(->local ’t ’theta ’thetadot))) 


1 
glm cos (0) —gmys (1) +5P?mé? +-ImbDy, (t) sin (@)+5m (Dy, (t))? 


This is the same as the Lagrangian found in section 1.6.2. 

We have found a very interesting decomposition of the La- 
grangian for constrained systems. One part consists of the dif- 
ference of the kinetic and potential energy of the constituents. 
The other part describes the constraints that are specific to the 
configuration of a particular system. 


1.6.4 The Lagrangian is Not Unique 


Lagrangians are not in a one-to-one relationship with physical 
systems—many Lagrangians can be used to describe the same 
physical system. In this section we will demonstrate this by show- 
ing that the addition to the Lagrangian of a “total time deriva- 
tive” of a function of the coordinates and time does not change 
the paths of stationary action or the equations of motion deduced 
from the action principle. 
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Total time derivatives 

Let’s first explain what we mean by a “total time derivative.” Let 
F be a function of time and coordinates. Then the time derivative 
of F along a path q is 


D(F oT [q]) = (DF of [q]) DI {q]. (1.108) 
Because F only depends on time and coordinates: 

DF oT {q] = [0oF oT [q], iF oT a]. (1.109) 
So we only need the first two components of DT [q], 


(DI[q])(t) = (1, Dalt), D?q(t),.-.), (1.110) 
to form the product 


D(F oT q]) = OF oT[q] + (31 F 0 T[q]) Dq 


= (HF +(AF)Q) Td, (1.111) 


where Q = Ip is a selector function:’? c = Q(a,b,c), so Dq = 


Q oT[q]. The function 


DF = OoF + (OF )Q (1.112) 


is called the total time derivative of F; it is a function of three 
arguments: the time, the generalized coordinates, and the gener- 
alized velocities. 

In general, the total time derivative of a local-tuple function F 
is that function DF that when composed with a local-tuple path 
is the time derivative of the composition of the function F with 
the same local-tuple path: 


DF oT |q] = D(F oT [q]). (1.113) 
The total time derivative D+F is explicitly given by 


D;F(t,q,v,a,...) = F(t, q, v,a,...) 
+O, F (t,q,v,@,...)v 
+ OoF (t,q,v,a,...)a+--, (1.114) 


“Components of a tuple structure, such as the value of I'[q](¢) can be selected 
with selector functions: I; gets the element with index 7 from the tuple. 
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where we take as many terms as needed to exhaust the arguments 
of F. 


Exercise 1.25: Properties of D; 


The total time derivative DF is not the derivative of the function F. 
Nevertheless, the total time derivative shares many properties with the 
derivative. Demonstrate that D, has the following properties for local- 
tuple functions F and G, number c, and a function H with domain 
containing the range of G. 


a. D,(F+G) = D,F + D,G 

b. Di(cF) = cD, F 

c. Di(FG) = FD,G+ (DiF)G 
d. Di(Ho G) =(DHoG)DG. 


Adding total time derivatives to Lagrangians 

Consider two Lagrangians L and L’ that differ by the addition of 
a total time derivative of a function F that depends only on the 
time and the coordinates 


L = L+ DF. (1.115) 


The corresponding action integral is 


S'tal(t.ta) = f E orla 


ti 


= [e+ DP) ort 


= f  Lortal+ D(F oT {q]) 


= Slq|(ti, t2) + (F old) lz. (1.116) 


The variational principle states that the action integral along a 
realizable trajectory is stationary with respect to variations of the 
trajectory that leave the configuration at the endpoints fixed. The 
action integrals S{q](t1,t2) and S’[q](ti, t2) differ by a term 


(F oD ql)? = F (ta, a(t2)) — F(t, a(t) (1.117) 


that depends only on the coordinates and time at the endpoints 
and these are not allowed to vary. Thus, if S[q|(t1, t2) is stationary 
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for a path, then $’[q|(ti,t2) will also be stationary. So either 
Lagrangian can be used to distinguish the realizable paths. 

The addition of a total time derivative to a Lagrangian does 
not affect whether the action is critical for a given path. So if we 
have two Lagrangians that differ by a total time derivative the 
corresponding Lagrange equations are equivalent in that the same 
paths satisfy each. Moreover, the additional terms introduced into 
the action by the total time derivative only appear in the endpoint 
condition and thus do not affect the Lagrange equations derived 
from the variation of the action, so the Lagrange equations are the 
same. So the Lagrange equations are not changed by the addition 
of a total time derivative to a Lagrangian. 


Exercise 1.26: Lagrange equations for total time derivatives 


Let F(t,q) be a function of t and q only, with total time derivative 
DF = F + O,FQ. (1.118) 


Show explicitly that the Lagrange equations for D+F are identically zero, 
and thus that the addition of D;F to a Lagrangian does not affect the 
Lagrange equations. 


The driven pendulum provides a nice illustration of adding total 
time derivatives to Lagrangians. The equation of motion for the 
driven pendulum (see section 1.6.2), 


ml? D?0(t) + ml(g + D?y<(t)) sin 0(t) = 0, (1.119) 


has an interesting and suggestive interpretation: it is the same as 
the equation of motion of an undriven pendulum, except that the 
acceleration of gravity g is augmented by the acceleration of the 
pivot D?y,. This intuitive interpretation was not apparent in the 
Lagrangian derived as the difference of the kinetic and potential 
energies in section 1.6.2. However, we can write an alternate La- 
grangian that has the same equations of motion that is as easy to 
interpret as the equation of motion: 


L'(t,0,6) = iml?6? + ml(g + D?ys(t)) cos 6. (1.120) 


With this Lagrangian it is apparent that the effect of the acceler- 
ating pivot is to modify the acceleration of gravity. Note, however, 
that it is not the difference of the kinetic and potential energies. 
Let’s compare the two Lagrangians for the driven pendulum. The 
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difference AL = L — L’ is 


AL(t, 6,0) = im(Dys(t))? + ml Dy, (t)6 sin 0 
— gmys(t) — mlD*y.(t) cos 0. (1.121) 


The two terms in AL that do not depend on either 6 or 6 do not 
affect the equations of motion. The remaining two terms are the 
total time derivative of the function F(t,0) = —mlDy,(t) cos @, 
which does not depend on 6. The addition of such terms to a 
Lagrangian does not affect the equations of motion. 


Identification of total time derivatives 
If the local-tuple function G, with arguments (t, q, v), is the total 
time derivative of a function F, with arguments (t,q), then G 
must have certain properties. 

From equation (1.112), we see that G must be linear in the 
generalized velocities 


Git, q, v) = Gi(t,q,v) v + Go(t,q, v) (1.122) 


where neither Giy nor Gg depend on the generalized velocities: 
0oG1 = 02G2 = 0. 

If G is the total time derivative of F then G1 = 0,F and Go = 
oF, so 


OoGy = OOF 
O1G2 = O10oF. (1.123) 
The partial derivative with respect to the time argument does 


not have structure, so 090, F = 000, F. So if G is the total time 
derivative of F then 


OoG, = 01Go. (1.124) 
Furthermore, G; = 0, F, so 
OG, = 0,0, F. (1.125) 


If there is more than one degree of freedom these partials are 
actually structures of partial derivatives with respect to each co- 
ordinate. The partial derivatives with respect to two different 
coordinates must be the same independent of the order of the 
differentiation. So 0;G, must be symmetric. 


1.7 Evolution of Dynamical State 67 


Note that we have not shown that these conditions are sufficient 
for determining that a function is a total time derivative, only that 
they are necessary. 


Exercise 1.27: Identifying total time derivatives 


For each of the following functions, either show that it is not a total 
time derivative or produce a function from which it can be derived. 


a. G(t, £, Vz) = Mvz 


b. G(t, £, vz) = mv, cost 

c. G(t, £, vz) = vz cost — xsin t 

d. G(t, £, Us) = Vg cost + xsint 

e. G(t; £, Y; Vx, Vy) = 2(xvz + yvy) cost — (x? + y”) sint 

f. G(t; £, Y; Vr, Vy) = 2(rvy + yvy) cost — (x? + y?) sint + yu, + zvy 


1.7 Evolution of Dynamical State 


Lagrange’s equations are ordinary differential equations that the 
path must satisfy. They can be used to test if a proposed path is 
a realizable path of the system. However, we can also use them 
to develop a path, starting with initial conditions. 

The state of a system is defined to be the information that 
must be specified for the subsequent evolution to be determined. 
Remember our juggler: he or she must throw the pin in a cer- 
tain way for it to execute the desired motion. The juggler has 
control of the initial position and orientation of the pin, and the 
initial velocity and spin of the pin. Our experience with juggling 
and similar systems suggests that the initial configuration and the 
rate of change of the configuration are sufficient to determine the 
subsequent motion. Other systems may require higher derivatives 
of the configuration. 

For Lagrangians that are written in terms of a set of generalized 
coordinates and velocities we have shown that Lagrange’s equa- 
tions are second-order ordinary differential equations. If the dif- 
ferential equations can be solved for the highest-order derivatives 
and if the differential equations satisfy appropriate conditions”? 


*3For example, the Lipschitz condition is that the rate of change of the deriva- 
tive is bounded by a constant in an open set around each point of the trajec- 
tory. See [22] for a good treatment of the Lipschitz condition. 
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then there is a unique solution to the initial-value problem: given 
values of the solution and the lower derivatives of the solution at 
a particular moment there is a unique solution function. Given 
irredundant coordinates the Lagrange equations satisfy these con- 
ditions.“ Thus a trajectory is determined by the generalized co- 
ordinates and the generalized velocities at any time. This is the 
information required to specify the dynamical state. 

A complete local description of a path consists of the path and 
all of its derivatives at a moment. The complete local descrip- 
tion of a path can be reconstructed from an initial segment of 
the local tuple, given a prescription for computing higher-order 
derivatives of the path in terms of lower-order derivatives. The 
state of the system is specified by that initial segment of the local 
tuple from which the rest of the complete local description can be 
deduced. The complete local description gives us the path near 
that moment. Actually, all we need is a rule for computing the 
next higher derivative; we can get all the rest from this. Assume 
that the state of a system is given by the tuple (t,q, v). If we are 
given a prescription for computing the acceleration a = A(t,q,v), 
then 


D?q = AoT|q], (1.126) 
and we have as a consequence 
D?q = D(AoT[q]) = DA ola], (1.127) 


and so on. So the higher derivative components of the local tuple 
are given by functions D;A, D?A, .... Each of these functions 
depends on lower derivative components of the local tuple. All we 
need to deduce the path from the state is a function that gives 
the next higher derivative component of the local description from 
the state. We use the Lagrange equations to find this function. 


“4Tf the coordinates are redundant we cannot, in general solve for the highest- 
order derivative. However, since we can transform to irredundant coordinates, 
and since we can solve the initial-value problem in the irredundant coordinates, 
and since we can construct the redundant coordinates from the irredundant 
coordinates, we can in general solve the initial-value problem for redundant 
coordinates. The only hitch is that we may not specify arbitrary initial con- 
ditions: the initial conditions must be consistent with the constraints. 
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First, we expand the Lagrange equations 
OL oP lq) = D(A2L o Tq) 
so that the second derivative appears explicitly 


ðL o Tq] 
= Op02L o T |q] + (31ð2L oT [q]) Dq + (3282L 0 T[q]) D?4q. 


Solving this system for D?q one obtains the generalized accelera- 
tion along a solution path q 


D?q = 
[32ð2L o T|q]] + [31L o Tq] — (3132L oT [q]) Dq — 82L o T [q] 


where [0,02L 0T]~! is the inverse of the Hessian matrix. The 
function that gives the acceleration is 


A = (0202L)~* [AL — æL — (0,02L)Q] , (1.128) 


where Q = Is is the velocity component selector. 

That initial segment of the local tuple that specifies the state 
is called the local state tuple, or, more simply, the state tuple. 

We can express the function that gives the acceleration as a 
function of the state tuple as the following procedure. It takes 
a procedure that computes the Lagrangian, and returns a pro- 
cedure that takes a state tuple as its argument and returns the 
acceleration. 


(define (Lagrangian->acceleration L) 
(let ((P ((partial 2) L)) 
(F ((partial 1) L))) 
(/ CF 
(+ ((partial 0) P) 
(x (Cpartial 1) P) velocity))) 
((partial 2) P)))) 


Once we have a way of computing the acceleration from the 
coordinates and the velocities, we can give a prescription for com- 
puting the derivative of the state as a function of the state. For 


Tn Scmutils division by a matrix is interpreted as multiplication on the left 
by the inverse matrix. 
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the state (t, q(t), Dq(t)) at the moment t the derivative of the state 
is (1, Dq(t), D?q(t)) = (1, Da(t), A(t, g(t), Da(t))). The procedure 
Lagrangian->state-derivative takes a Lagrangian and returns 
a procedure that takes a state and returns the derivative of the 
state: 


(define (Lagrangian->state-derivative L) 
(let ((acceleration (Lagrangian->acceleration L))) 
(lambda (state) 
(up 1 
(velocity state) 
(acceleration state))))) 


We represent a state by an up-tuple of the components of that 
initial segment of the local tuple that determine the state. 

For example, the parametric state derivative for a harmonic 
oscillator is 


(define (harmonic-state-derivative m k) 
(Lagrangian->state-derivative (L-harmonic m k))) 


(print-expression 
(Charmonic-state-derivative ’m ’k) 
(up ’t (up ’x ’y) (up ’v_x ’v_y)))) 
(up 1 (up v_x v-y) (up (/ (* -1 k x) m) (/ (* -1 k y) m))) 


The Lagrange equations are second-order system of differential 
equations that constrain realizable paths q. We can use the state 
derivative to express the Lagrange equations as a first-order sys- 
tem of differential equations that constrain realizable coordinate 
paths q and velocity paths v: 


(define ((Lagrange-equations-first-order L) q v) 
(let ((state-path (qv->state-path q v))) 
(- (D state-path) 
(compose (Lagrangian->state-derivative L) 
state-path)))) 


(define ((qv->state-path q v) t) 
(up t (q t) (v t))) 


For example, we can find the first-order form of the equations of 
motion of a two-dimensional harmonic oscillator: 
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(show-expression 
(((Lagrange-equations-first-order (L-harmonic ’m ’k)) 
(up (literal-function ’x) 
(literal-function ’y)) 
(up (literal-function ’v_x) 
(literal-function ’v_y))) 
t)) 


The zero in first element of the structure of the Lagrange equa- 
tions residuals is just the tautology that time advances uniformly: 
that the time function is just the identity, so its derivative is 1 
and the residual is zero. The equations in the second element 
constrain the velocity path to be the derivative of the coordinate 
path. The equations in the third element give the rate of change 
of the velocity in terms of the applied forces. 


Numerical integration 
A set of first order ordinary differential equations that give the 
state derivative in terms of the state can be integrated to find the 
state path that emanates from a given initial state. Numerical 
integrators find approximate solutions of such differential equa- 
tions by a process illustrated in figure 1.6. The state derivative 
produced by Lagrangian->state-derivative can be used by a 
package that numerically integrates systems of first-order ordinary 
differential equations. 

The procedure state-advancer can be used to find the state of 
a system at a specified time, given an initial state, which includes 
the initial time, and a parametric state-derivative procedure.”© 


6The Scmutils system provides a stable of numerical integration routines 
that can be accessed through this interface. These include quality-controlled 
Runge-Kutta (QCRK4) and Bulirsch-Stoer. The default integration method 
is Bulirsch-Stoer. 
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Figure 1.6 The input to the system derivative is the state. The func- 
tion A gives the acceleration as a function of the components that de- 
termine the state. The output of the system derivative is the derivative 
of the state. The integrator takes the derivative of the state as its in- 
put and produces the integrated state, starting at the initial conditions. 
Notice how the second-order system is put into first-order form by the 
routing of the Dgq(t) components in the system derivative. 


For example, to advance the state of a two-dimensional harmonic 
oscillator we write’’ 


(print-expression 
((state-advancer harmonic-state-derivative 2. 1.) 
(up O. (up 1. 2.) (up 3. 4.)) 
10 
1.e-12) 
(up 10. 
(up 3.712791664584467 5.420620823651575) 
(up 1.6148030925459906 1.8189103724750977) ) 


The arguments to state-advancer are a parametric state deriva- 
tive, harmonic-state-derivative, and the state-derivative pa- 


"’The procedure state-advancer automatically compiles state-derivative pro- 
cedures the first time they are encountered. The first time a new state- 
derivative is used there is a delay while compilation occurs. 
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rameters (mass 2. and spring constant 1.). A procedure is re- 
turned that takes an initial state, (up 0. (up 1. 2.) (up 3. 
4.)), a target time, 10, and a relative error tolerance, 1.e-12. 
The output is an approximation to the state at the specified final 
time. 

Consider the driven pendulum, described above, with a periodic 
drive. We choose ys(t) = a cos wt. 


(define ((periodic-drive amplitude frequency phase) t) 
(* amplitude (cos (+ (* frequency t) phase)))) 


(define (L-periodically-driven-pendulum m 1 g a omega) 
(let (Cys (periodic-drive a omega 0))) 
(L-pend m 1 g ys))) 


Lagrange’s equation for this system is: 


(show-expression 
(((Lagrange-equations 
(L-periodically-driven-pendulum ’m ’1 ’g ’a ’omega)) 
(literal-function ’theta) ) 
’t)) 


D?6 (t) Pm — cos (wt) sin (6 (t)) almw? + sin (0 (t)) glm 


The parametric state derivative for the periodically driven pendu- 
lum is 


(define (pend-state-derivative m 1 g a omega) 
(Lagrangian->state-derivative 
(L-periodically-driven-pendulum m 1 g a omega))) 


(show-expression 
((pend-state-derivative ’m ?’1 ’g ’a ’omega) 
(up ’t ’theta ’thetadot))) 


6 
aw? cos (wt)sin(@)  gsin (0) 
l l 
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To examine the evolution of the driven pendulum we need a 
mechanism that evolves a system for some interval while moni- 
toring aspects of the system as it evolves. The procedure evolve 
provides this service, using state-advancer repeatedly to advance 
the state to the required moments. The procedure evolve takes a 
parametric state-derivative and its parameters and returns a pro- 
cedure that evolves the system from a specified initial state to a 
number of other times, monitoring some aspect of the state at 
those times. To generate a plot of the angle versus time we make 
a monitor procedure that generates the plot as the evolution pro- 
ceeds: 78 


(define ((monitor-theta win) state) 
(let ((theta ((principal-value :pi) (coordinate state)))) 
(plot-point win (time state) theta))) 


(define plot-win (frame 0. 100. :-pi :pi)) 


(Cevolve pend-state-derivative 


1.0 ;m=1kg 
1.0 ;l=im 
9.8 ;g=9.8m/s? 
0.1 ;a=1/10 m 
(* 2.0 (sqrt 9.8)) ) ;omega 
(up 0.0 ;to=0 
1. ;thetag=1 radian 
0.) ;thetadotyg=0 radians/s 
(monitor-theta plot-win) 
0.01 ;step between plotted points 
100.0 ;final time 
1.0e-13) ;local error tolerance 


Figure 1.7 shows the angle 0 versus time for a couple of orbits for 
the driven pendulum. The initial conditions for the two runs are 
the same except that in one the bob is given a tiny velocity equal to 
10-1°m/s, about one atom width per second. The initial segments 


The results are plotted in a plot-window that is created by the procedure 
frame with arguments xmin, xmax, ymin, ymin, that specify the limits of the 
plotting area. Points are added to the plot with the procedure plot-point 
that takes a plot-window and the abscissa and ordinate of the point to be 
plotted. 

The procedure principal-value is used to reduce an angle to a standard 
interval. The argument to principal-value is the point at which the circle is 
to be cut. Thus (principal-value :pi) is a procedure that reduces an angle 
6 to the interval —r < 0 < 7. 
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Figure 1.7 Orbits of the driven pendulum. The angle 0 is plotted 
against time. Because angles are periodic, this plot may be thought 
of as wound around a cylinder. The upper plot shows the results of a 
simulation with initial conditions 9 = 1 and 6 = 0. The orbit oscillates 
for a while, then circulates, then resumes oscillating. In the lower plot 
we show the result for a slightly different initial angular velocity, 0 = 
1071°, The initial behavior is indistinguishable from the top figure, but 
the two trajectories become uncorrelated after the transition between 
oscillation and circulation. This extreme sensitivity to initial conditions 
is characteristic of systems with chaotic behavior. 
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of the two orbits are indistinguishable. After about 75 seconds the 
two orbits diverge and become completely different. This extreme 
sensitivity to tiny changes in initial conditions is characteristic of 
what is called chaotic behavior. Later, we will investigate this 
example further, using other tools such as Lyapunov exponents, 
phase space, and Poincaré sections. 


1.8 Conserved Quantities 


A quantity that is a function of the state of the system that is 
constant along a solution path is called a conserved quantity or a 
constant of motion. If C is a conserved quantity, then 


D(C oT |q]) = DiC oT q] = 0 (1.129) 


for solution paths q. Following historical practice we also refer 
to constants of the motion as integrals of the motion.”? In this 
section, we will investigate systems with symmetry and find that 
symmetries are associated with conserved quantities. For instance, 
linear momentum is conserved in a system with translational sym- 
metry, angular momentum is conserved if there is rotational sym- 
metry, energy is conserved if the system does not depend on the 
origin of time. We first consider systems for which a coordinate 
system can be chosen that naturally expresses the symmetry, and 
later discuss systems for which no coordinate system can be chosen 
that simultaneously expresses all symmetries. 


1.8.1 Conserved Momenta 


If a Lagrangian L(t,q,v) does not depend on some particular co- 
ordinate q’, then 


(@&L)i = 0; (1.130) 


and the corresponding ith component of the Lagrange equations 
is 


(D(d2L oT [q])),; = 0. (1.131) 


Tn the older literature conserved quantities are sometimes called first inte- 
grals. 
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This is the same as®? 


D ((@2L): o T[g))) = 0. (1.132) 
So we see that 
Pi = (2L); (1.133) 


is a conserved quantity. The function P is called the momen- 
tum state function. The value of the momentum state function 
is the generalized momentum. We refer to ith component of the 
generalized momentum as the momentum conjugate to the ith co- 
ordinate.5! A generalized coordinate component that does not 
appear explicitly in the Lagrangian is called a cyclic coordinate. 
The generalized momentum component conjugate to any cyclic 
coordinate is a constant of the motion. Its value is constant along 
realizable paths; it may have different values on different paths. 
As we will see, momentum is an important quantity even when it 
is not conserved. 

Given the coordinate path q and the Lagrangian L, the momen- 
tum path p is 


p = OL o T|] = P old, (1.134) 
with components 
pi = Pi oT qd. (1.135) 


The momentum path is well defined for any path q. If the path is 
realizable and the Lagrangian does not depend on q’ then p; is a 
constant function 


Dp; = 0. (1.136) 


The constant value of p; may be different for different trajectories. 


®°The derivative of a component is equal to the component of the derivative. 


81 Observe that we indicate a component of the generalized momentum with 
a subscript, and indicate a component of the generalized coordinates with a 
superscript. These conventions are consistent with the ones that are commonly 
used in tensor algebra, which is sometimes helpful in working out complex 
problems. 
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Examples of conserved momenta 
The free particle Lagrangian L(t, x,v) = mv” is independent of 
x. So the momentum state function, P(t, q, v) = mv, is conserved 
along realizable paths. The momentum path p for the coordinate 
path q is p(t) = PoT|g(t) = m Dq(t). For a realizable path 
Dp(t) = 0. For the free particle the usual linear momentum is 
conserved for realizable paths. 

For a particle in a central force field (section 1.6), the La- 
grangian 


L(tyr, p; t, b) = m(t? + r°’) — V(r) 


depends on r but is independent of y. The momentum state- 
function is 


P(t; r, p; ô, p) = [mi*,mr’¢] . 


It has two components. The first component, “the radial mo- 
mentum,” is not conserved. The second component, “the angular 
momentum,” is conserved along any solution trajectory. 

If the central potential problem had been expressed in rectan- 
gular coordinates, then all of the coordinates would have appeared 
in the Lagrangian. In that case there would not be any obvious 
conserved quantities. Nevertheless, the motion of the system does 
not depend on the choice of coordinates; so the angular momen- 
tum is still conserved. 

We see that there is great advantage in making a judicious 
choice for the coordinate system. If we can choose the coordinates 
so that a symmetry of the system is reflected in the Lagrangian 
by the absence of some coordinate component, then the existence 
of a corresponding conserved quantity will be automatic.®? 


1.8.2 Energy Conservation 


Momenta are conserved by the motion if the Lagrangian does not 
depend on the corresponding coordinate. There is another con- 


82n general, conserved quantities in a physical system are associated with 
continuous symmetries, whether or not one can find a coordinate system in 
which the symmetry is apparent. This powerful notion was formalized and a 
theorem linking conservation laws with symmetries was proved by E. Noether 
early in the 20th century. See section 1.8.4 on Noether’s theorem. 
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stant of the motion, the energy, if the Lagrangian L(t, q,¢ġ) does 
not depend explicitly on the time: Op L = 0. 

Consider the time derivative of the Lagrangian along a solution 
path q: 


D(LoT[4]) = 3L oT|q] + (AL ol |g) Da + (32L oP |g) D7q.(1.137) 
Using Lagrange’s equations to rewrite the second term 
D(LoI¥'[q]) = (AL) oP [q]+ D(2LeF [q]) Dqt (O2LeF [q]) D?q.(1.138) 


Isolating oL and combining the first two terms on the right side 


(30L) o T|] = D(L o T|a]) — D((A2L o Tla]) Da) 
= D(L o T[q]) — D((82L o Tla])(Q o T[a])) 
= D((L— PQ) oTf[q]), (1.139) 


where, as before, Q selects the velocity from the state. So we see 
that if ðL = 0 then 


E=PQ-L, (1.140) 


is a conserved along realizable paths. The function € is called 
the energy state function. Let E = E oT|q] denote the energy 
function on the path q. The energy function has a constant value 
along any realizable trajectory if the Lagrangian has no explicit 
time dependence; the energy E may have a different value for dif- 
ferent trajectories. A system that has no explicit time dependence 
is called autonomous. 
Given a Lagrangian L, we may compute the energy: 


(define (Lagrangian->energy L) 
(let ((P ((partial 2) L))) 
(- (* P velocity) L))) 


Energy in terms of kinetic and potential energies 

In some cases the energy can be written as the sum of kinetic and 
potential energies. Suppose the system is composed of particles 
with rectangular coordinates xg, the movement of which may be 
subject to constraints, and that these rectangular coordinates are 
some functions of the generalized coordinates q and possibly time 


®3The sign of the energy state function is a matter of convention. 
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t: Xa = fa(t,q). We form the Lagrangian as L = T — V and 
compute the kinetic energy in terms of q by writing the rectangular 
velocities in terms of the generalized velocities: 


Va = Oo falt, q) + ôi falt, q)v. (1.141) 


The kinetic energy is 
T(t, q, v) = $ Za Mava, (1.142) 


where vq is the magnitude of va. 

If the fa functions do not depend explicitly on time (ofa = 0), 
then the rectangular velocities are homogeneous functions of the 
generalized velocities of degree 1, and T is a homogeneous function 
of the generalized velocities of degree 2, because it is formed by 
summing the square of homogeneous functions of degree 1. If T is 
a homogeneous function of degree 2 in the generalized velocities 
then 


PQ = (T)Q = 2T, (1.143) 


where the second equality follows from Euler’s theorem on homo- 
geneous functions.** The energy state function is 


E=PQ-L=PQ-T+YV. (1.144) 


So if fa is independent of time, the energy function can be rewrit- 
ten 


€=2T-T+V=T+HV. (1.145) 


Notice that if V depends on time the energy is still the sum of 
the kinetic energy and potential energy, but the energy is not 
conserved. 

The energy state function is always a well defined function, 
whether or not it can be written in the form of T+V, and whether 
or not it is conserved along realizable paths. 


84Buler’s theorem says that if f is a function of z = (xo, 21,...) that is homo- 
geneous of degree n in each of the x;, then 


a 
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Exercise 1.28: 
An analogous result holds when the fẹ do depend explicitly on time. 


a. Show that in this case the kinetic energy contains terms that are 
linear in the generalized velocities. 


b. Show that, by adding a total time derivative, the Lagrangian can 
be written in the form L = A — B, where A is a homogeneous quadratic 
form in the generalized velocities, and B is velocity independent. 


c. Show, using Euler’s theorem, that the energy function is € = A+ B. 


An example where terms that were linear in the velocity were removed 
from the Lagrangian by adding a total time derivative has already been 
given: the driven pendulum. 


Exercise 1.29: 


A particle of mass m slides off a horizontal cylinder of radius R in a 
uniform gravitational field with acceleration g. If the particle starts 
close to the top with zero initial speed, with what angular velocity does 
the particle leave the cylinder? 


1.8.3 Central Forces in Three Dimensions 


One important physical system is the motion of a particle in a cen- 
tral field in three dimensions, with an arbitrary potential energy 
V(r) depending only on the radius. We will describe this system 
in spherical coordinates r, 0, and y, where 0 is the colatitude and 
y is the longitude. The kinetic energy has three terms: 


T(t; r,0, 037, 0,~) = m(t? + 7262 + r?(sin 0)2ġ?). 
As a procedure: 


(define ((T3-spherical m) state) 
(let ((t (time state)) 
(q (coordinate state)) 
(qdot (velocity state))) 
(let ((r (ref q 0)) 
(theta (ref q 1)) 
(phi (ref q 2)) 
(rdot (ref qdot 0)) 
(thetadot (ref qdot 1)) 
(phidot (ref qdot 2))) 
(* 1/2 m 
(+ (square rdot) 
(square (* r thetadot)) 
(square (* r (sin theta) phidot))))))) 
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The Lagrangian is then formed by subtracting the potential en- 
ergy: 


(define (L3-central m Vr) 
(define (Vs state) 
(let ((r (ref (coordinate state) 0))) 
(Vr r))) 
(- (T3-spherical m) Vs)) 


Let’s first look at the generalized forces (the derivatives of the La- 
grangian with respect to the generalized coordinates). We com- 
pute these with a partial derivative with respect to the coordinate 
argument of the Lagrangian: 


(show-expression 
(((partial 1) (L3-central ’m (literal-function ’V))) 
(up ’t 
(up ’r ’theta ’phi) 
(up ’rdot ’thetadot ’phidot)))) 


Pe (sin (8)? + mr? — DV a 


| my"r? cos (8) sin (8) | 


0 


The y component of the force is zero because y does not appear 
in the Lagrangian (it is a cyclic variable). The corresponding 
momentum component is conserved. Compute the momenta: 


(show-expression 
(((partial 2) (L3-central ’m (literal-function ’V))) 
(up °t 
(up ’r ’theta ’phi) 
(up ’rdot ’thetadot ’phidot)))) 


| " | 
mr? 


mr?ġ (sin (0))? 


The momentum conjugate to y is conserved. This is the z com- 
bi 


ponent of the angular momentum 7 x (mw), for vector position 
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r and linear momentum mv. We can show this by writing the z 
component of the angular momentum in spherical coordinates: 


(define ((ang-mom-z m) state) 
(let ((q (coordinate state)) 
(v (velocity state))) 

(ref (cross-product q (* m v)) 2))) 


(define (s->r state) 
(let ((q (coordinate state))) 
(let ((r (ref q 0)) 
(theta (ref q 1)) 
(phi (ref q 2))) 
(let ((x (* r (sin theta) (cos phi))) 
(y (* r (sin theta) (sin phi))) 
(z (* r (cos theta)))) 
(up x y z)))))) 


(show-expression 
((compose (ang-mom-z ’m) (F->C s->r)) 
(up °t 
(up ’r ’theta ’phi) 
(up ’rdot ’thetadot ’phidot)))) 


mr? (sin (0))? 


The choice of the z-axis is arbitrary, so the conservation of any 
component of the angular momentum implies the conservation of 
all components. Thus the total angular momentum is conserved. 
We can choose the z axis so all of the angular momentum is in the 
z component. The angular momentum must be perpendicular to 
both the radius vector and to the linear momentum vector. Thus 
the motion is planar, 0 = 7/2, and 6 = 0. Planar motion in a 
central-force field was discussed in section 1.6. 

We can also see that the energy state function computed from 
the Lagrangian for a central field is in fact T + V: 


(show-expression 
((Lagrangian->energy (L3-central ’m (literal-function ’V))) 
(up ?t 
(up ’r ’theta ’phi) 
(up ’rdot ’thetadot ’phidot)))) 
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1 1 . 1 
ser (sin (0))? 4 smn 5m + V(r) 


The energy is conserved because the Lagrangian has no explicit 
time dependence. 


Exercise 1.30: Driven spherical pendulum 

A spherical pendulum is a massive bob, subject to uniform gravity, that 
may swing in three dimensions, but remains at a given distance from 
the pivot. Formulate a Lagrangian for a spherical pendulum, driven 
by vertical motion of the pivot. What symmetry(ies) can you find? 
Find coordinates that express the symmetry. What is conserved? Give 
analytic expression(s) for the conserved quantity(ies). 


1.8.4 Noether’s Theorem 


We have seen that if a system has a symmetry and if a coordinate 
system can be chosen so that the Lagrangian does not depend 
on the coordinate associated with the symmetry then there is a 
conserved quantity associated with the symmetry. However, there 
are more general symmetries for which there is no coordinate sys- 
tem that fully expresses the symmetry. For example, motion in a 
central potential is spherically symmetric, the dynamical system 
is invariant under rotations about any axis, but the expression of 
the Lagrangian for the system in spherical coordinates only ex- 
hibits symmetry around one axis. More generally, a Lagrangian 
has a symmetry if there is a coordinate transformation that leaves 
the Lagrangian unchanged. A continuous symmetry is a paramet- 
ric family of symmetries. Here we show that for any continuous 
symmetry there is a conserved quantity. P 

Consider a parametric coordinate transformation F with pa- 


rameter s:°° 


x = F(s)(t,2’). (1.146) 


To this parametric coordinate transformation there corresponds a 
parametric state transformation C: 


(t,x,v) = C(s)(t, 2’, v"). (1.147) 


8>Noether’s theorem is more general than we state and prove it here. We 


assume the transformations F(s) have no dependence on the generalized ve- 
locities. Properly, we should also consider velocity dependent symmetries. 
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We require that the transformation F (0) is the identity coordinate 
transformation x! = F(0)(t,x’); and as a consequence C(0) is 
the identity state transformation (t, x’, v") = C(0)(t,x’,v’). The 
Lagrangian L has a continuous symmetry corresponding to F if it 
is invariant under the transformations 


L(s) =LoC(s) =L (1.148) 


for any s. The Lagrangian L is the same function as the trans- 
formed Lagrangian L(s). 7 i 
That L(s) = L for any s implies DL(s) = 0. Explicitly, L(s) is 


L(s)(t, 2", v') = L(t, F(s)(t, 2’), Di(F(s))(t, 2’, v"), (1.149) 


where we have rewritten the velocity component of C(s) in terms 
of the total time derivative. The derivative of L is zero: 


0 = DL(s)(t, 2’, v") 
= 0, L(t, x, v)(DF)(s)(t, x") + O2L(t, x, v) Di(DF(s)) (t, 2’), 
(1.150) 


where we have used the fact that®® 


D,(DF(s)) = DG(s) with G(s) = D;,(F(s)). (1.151) 


On a realizable path q we can use the Lagrange equations to 
rewrite the first term 


0 = (Did2L o TUDE) oT[g']) 
+ (2L o T[q])(Di(DF(s)) o P[q‘)). (1.152) 


For s = 0 the paths q and q’ are the same, so I'[q] = T[q'], and 
this equation becomes 


0 = ((D:ð2L)((DF)(0)) + (32L) (D:(DF(0)))) o Tla] 


86 The total time derivative is like a derivative with respect to a real-number 
argument in that it does not generate structure, so it can commute with 
derivatives that generate structure. Be careful though, it may not commute 
with some derivatives for other reasons. For example, D,0:(F(s)) is the same 
as 01D;(F(s)), but D,@2(F(s)) is not the same as 02D,(F(s)). The reason is 
that F(s) does not depend on the velocity, but D;(F(s)) does. 
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= D;((02L)(DF(0))) oT[q]. (1.153) 
Thus the state function Z, 
T = (0L)(DF(0)), (1.154) 


is conserved along solution trajectories. This is Noether’s inte- 
gral. The integral is the product of the momentum and a vector 
associated with the symmetry. 


Illustration: motion in a central potential 
For example, consider the central potential Lagrangian in rectan- 
gular coordinates: 


L(t; T, Y, Z; Ug- Uy, Vz) 


= 5m (v + vf + v2) -U (VFF) ; (1.155) 


and a parametric rotation R,(s) about the z axis 


x g! x’ cos s — y'sin s 
(x) =r (v) = [snes yom), (1.156) 
Z oa a 


The rotation is an orthogonal transformation so 

ety? + 2? = (a!) + (y/)? + (2). (1.157) 
Differentiating along a path, we get 

(Uz, Vy, Uz) = Rz(5) (Vy Vy V), (1.158) 


so the velocities also transform by an orthogonal transformation, 


and uz + v2 +v? = (v4)? + (vi)? + (v1)? . Thus 


PS IE EA A E 
Die oe 0 UV Vz) 


xv? y? 
= 9m (vp)? + (vy)? + 0’) 
=U(V@P EGP SEY), (1.159) 


and we see that L’ is precisely the same function as L. 
The momenta are 


Or L(t; £, Y, Z; Uz, Vy, Vz) = [Mvz, My, Mmv] . (1.160) 


1.8.4 Noether’s Theorem 87 


and 
DE(0)(t; x,y,z) = DR-(0)(x, y, z) = (y, —2,0). (1.161) 
So the Noether integral is 


L(t; £, Y, 2 Ue, Vy, Vz) = ((O2L)(DF(0)))(t; £, Y, 25 Ve, Vy, Vz) 
= M(YVr — Buy); (1.162) 


which we recognize as minus the z component of the angular mo- 
mentum: Z x (my). Since the Lagrangian is preserved by any 
continuous rotational symmetry, all components of the vector an- 
gular momenta are conserved for the central potential problem. 
The procedures calls ((Rx angle-x) q), ((Ry angle-y) q), 
and ((Rz angle-z) q) rotate the rectangular tuple q about the 
indicated axis by the indicated angle.” We use these to make a 


parametric coordinate transformation F-tilde: 


(define (F-tilde angle-x angle-y angle-z) 
(compose (Rx angle-x) (Ry angle-y) (Rz angle-z) coordinate) ) 


A Lagrangian for motion in a central potential is: 
(define ((L-central-rectangular m U) tqp) 
(let ((q (coordinate state)) 
(v (velocity state))) 
(- (* 1/2 m (square v)) (U (sqrt (square q)))))) 


The Noether integral is then 


87The definition of the procedure Rx is 


(define ((Rx angle) q) 
(let ((ca (cos angle)) (sa (sin angle))) 
(let ((x (ref q 0)) (y (ref q 1)) (z (ref q 2))) 
(up x 
(- (* ca y) (* sa z)) 
(+ (* sa y) (* ca z)))))) 


The definitions of Ry and Rz are similar. 
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(define Noether-integral 
(let ((L (L-central-rectangular 
?m (literal-function ’U)))) 
(* ((partial 2) L) ((D F-tilde) 0 0 0)))) 


(print-expression 
(Noether-integral 
(up °t 
(up x ry Zz) 
(up ’vx ?vy ’vz)))) 
(down (+ (* m vy z) (* -1 m vz y)) 
(+ (* m vz x) (* -1 m vx z)) 
(+ (* m vx y) (* -1 m vy x))) 


We get all three components of the angular momentum. 


1.9 Abstraction of Path Functions 


An essential step in the derivation of the local-tuple transforma- 
tion function C from the coordinate transformation F was the 
deduction of the relationship between the velocities in the two 
coordinate systems. We did this by inserting coordinate paths 
into the coordinate transformation function F’, differentiating, and 
then generalizing the results on the path to arbitrary velocities at 
a moment. The last step is an example of a more general problem 
of abstracting a local-tuple function from a path function. Given a 
function f of a local tuple a corresponding path-dependent func- 
tion f[q| is flg] = f oT lq]. Given f, how can we reconstitute 
f? The local-tuple function f depends on only a finite number of 
components of the local tuple, and f only depends on the corre- 
sponding local components of the path. So f has the same value 
for all paths that have that number of components of the local 
tuple in common. Given f we can reconstitute f by taking the 
argument of f, which is a finite initial segment of a local tuple, 
constructing a path that has this local description, and finding 
the value of f for this path. 

Two paths that have the same local description up to the nth 
derivative are said to osculate with order n contact. For example, 
a path and the truncated power series representation of the path 
up to order n have order n contact; if fewer than n derivatives 
are needed by a local-tuple function, the path and the truncated 
power series representation are equivalent. Let O be a function 
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that generates an osculating path with the given local tuple com- 
ponents. So O(t,q,v,...)(t) = q, D(O(t,q,v,...))(t) = v, and in 
general 


(t,q,v,...) =T[O(t,q, v, .. JIE). (1.163) 


The number of components of the local tuple that are required is 
finite, but unspecified. One way of constructing O is through the 
truncated power series 


O(t,q,v,a,.. (E) =q +0 — t) + sa(t! t)? +, (1.164) 


where the number of terms is the same as the number of compo- 
nents of the local tuple that are specified. 

Given the path function f we reconstitute the f function as 
follows. We take the argument of f and construct an osculating 
path with this local description. Then the value of f is the value 
of f for this osculating path: 


f(t,q,v,...) = foT[O(t, q, v,.. JE) = FIO, q, v, .. (£). (1.165) 


Let I be the function that takes a path function and returns 
the corresponding local-tuple function: 


f=T(f). (1.166) 
From equation (1.165) we see that 
TFt, q, v,...) = FIO, q, IO. (1.167) 


The procedure Gamma-bar implements the function I that re- 
constitutes a path-dependent function into a local-tuple function: 


(define ((Gamma-bar f-bar) local) 
((f-bar (osculating-path local)) (time local))) 


The procedure osculating-path takes a number of local compo- 
nents and returns a path with these components; it is implemented 
as a power series. 

We can use Gamma-bar to construct the procedure F->C that 
takes a coordinate transformation F and generates the procedure 
that transforms local tuples. The procedure F->C constructs a 
path-dependent procedure f-bar that takes a coordinate path in 
the primed system and returns the local tuple of the corresponding 
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path in the unprimed coordinate system. It then uses Gamma-bar 
to abstract f-bar to arbitrary local tuples in the primed coordi- 
nate system. 


(define (F->C F) 
(define (f-bar gq-prime) 
(define q 
(compose F (Gamma q-prime))) 
(Gamma q)) 
(Gamma-bar f-bar)) 


(show-expression 
((F->C p->r) 
(->local ’t (up ’r ’theta) (up ’rdot ’thetadot)))) 


t 


( r cos (0) 
_\rsin (0) 
( —r@ sin (8) + č cos (0) ) 


rô cos (8) + sin (6) 


Notice that in this definition of F->C we do not explicitly calculate 
any derivatives. The calculation that led up to the state transfor- 
mation (1.74) is not needed. 

We can also use I to make an elegant formula for computing 
the total time derivative D;F of the function F: 


DF =1(G), with Gig] = D(F oT[q)). (1.168) 


The implementation of the total time derivative as a program 
follows this definition. Given a procedure F implementing a local- 
tuple function and a path q we can construct a new procedure 
(compose F (Gamma q)). The procedure G-bar implements the 
derivative of this function of time. We then abstract this off the 
path with Gamma-bar to give the total time derivative. 


(define (Dt F) 
(define (G-bar q) 
(D (compose F (Gamma q)))) 
(Gamma-bar G-bar)) 
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Exercise 1.31: Velocity transformation 


Use the procedure Gamma-bar to construct a procedure that transforms 
velocities given a coordinate transformation. Apply this procedure to 
the procedure p->r to deduce (again) equation (1.65). 


Exercise 1.32: Path functions and state functions 


The local-tuple function f is the same as the local-tuple function I'(f) 
where f[q] = f oI |g]. On the other hand, the path function f[q], and the 
path function I (f) o Tg], are not necessarily the same. Explain. Give 
examples where they are the same and where they are not the same. 
Write programs to illustrate the behavior. 


Lagrange equations at a moment 
Given a Lagrangian, the Lagrange equations test paths for whether 
they are realizable paths of the system. The Lagrange equations 
relate the path and its derivatives. The fact that the Lagrange 
equations must be satisfied at each moment suggests that we can 
abstract the Lagrange equations off the path and write them as 
relations among the local-tuple components of realizable paths. 
Let E[L] be the path-dependent function that produces the 
residuals of the Lagrange equations (1.18) for the Lagrangian L: 


E[L][q] = D(82L oT (ql) — AL oF (a). (1.169) 
Realizable paths q satisfy the Lagrange equations 
E[L][q] = 0. (1.170) 


The path-dependent Lagrange equations can be converted to local 
Lagrange equations using [ 


E[L] = T'(E[Z}). (1.171) 


The operator E is called the Euler-Lagrange operator. In terms of 
this operator the Lagrange equations are 


E[L] oT [gq] = 0. (1.172) 
Applying the definition (1.167) of T 


E[L](t,¢,v,...) =I (E[L])(t,¢,v,...) 
= D(02L oT [O(t,q, v,.--)]) 
— OL oT[O(t,q,v,...)] 
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= (Di(ð2L))(t,q,v,...) — AL(t,g,v,..-) 
= (D,O2L — ð L)(t,q,v,..-). (1.173) 


So the Euler-Lagrange operator is explicitly 
E[L] = D;ð2L — ð, L. (1.174) 
The procedure Euler-Lagrange-operator implements E 


(define (Euler-Lagrange-operator L) 
(- (Dt ((partial 2) L)) ((partial 1) L))) 


For example, applied to the Lagrangian for the harmonic oscil- 
lator, 


(print-expression 
((Euler-Lagrange-operator 
(L-harmonic ’m ’k)) 
(->local ’t ’x ’v ’a))) 
(+ (* am) (* k x)) 


Notice that the components of the local tuple are individually 
specified. Using equation (1.172), the Lagrange equations for the 
harmonic oscillator are:®8 


(print-expression 

( (compose 
(Euler-Lagrange-operator (L-harmonic ’m ’k)) 
(Gamma (literal-function ’x) 4)) 
t)) 

(+ (* k (x t)) (* m (((expt D 2) x) t))) 


Exercise 1.33: Properties of E 


Let F and G be two Lagrangian-like functions of a local tuple, C be a 
local-tuple transformation function, and c a constant. Demonstrate the 
following properties: 


a. E[F + G] = E[F] + E[G] 

b. E[cF] = cE[F] 

c. E[FG] = E[F]G + FE[G] + (Di F)02G + 02F(DiG) 
d. E[F oC] = D,(DF 0 C)Q2C + DF o CE[C] 


88Notice that Gamma has one more argument than it usually has. This argument 
gives the length of the initial segment of the local tuple needed. The default 
length is 3, giving components of the local tuple up to and including the 
velocities. 
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1.10 Constrained Motion 


An advantage of the Lagrangian approach is that the coordinates 
can often be chosen to exactly describe the freedom of the sys- 
tem, automatically incorporating any constraints. We may also 
use coordinates that have more freedom than the system actu- 
ally has and consider explicit constraints among the coordinates. 
For example, the planar pendulum has a one-dimensional config- 
uration space. We have formulated this problem using the angle 
from the vertical as the configuration coordinate. Alternatively, 
we may choose to represent the pendulum as a body moving in 
the plane, constrained to be on the circle of the correct radius 
around the pivot. We would like to have valid descriptions for 
both choices and show they are equivalent. In this section we 
develop tools to handle problems with explicit constraints. The 
constraints considered here are more general than those consid- 
ered in the demonstration that the Lagrangian for systems with 
rigid constraints can be written as the difference of kinetic and 
potential energies (see section 1.6.2). 

Suppose the configuration of a system with n degrees of freedom 
is specified by n + 1 coordinates and that configuration paths q 
are constrained to satisfy some relation of the form 


p(t, g(t), Da(t)) = 0. (1.175) 


How do we formulate the equations of motion? One approach 
would be to use the constraint equation to eliminate one of the 
coordinates in favor of the rest, and then the evolution of the 
reduced set of generalized coordinates would be described by the 
usual Lagrange equations. The equations governing the evolution 
of coordinates that are not fully independent should be equivalent. 

We can address the problem of formulating equations of mo- 
tion for systems with redundant coordinates by returning to the 
action principle. Realizable paths are distinguished from other 
paths by having stationary action. Stationary refers to the fact 
that the action does not change with certain small variations of 
the path. What variations should be considered? We have seen 
that velocity-independent rigid constraints can be used to elim- 
inate redundant coordinates. In the irredundant coordinates we 
distinguished realizable paths using variations that by construc- 
tion satisfy the constraints. Thus in the case where constraints 
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can be used to eliminate redundant coordinates we can restrict 
the variations in the path to those that are consistent with the 
constraints. 

So how does the restriction of the possible variations affect the 
argument that led to Lagrange’s equations (refer to section 1.5)? 
Actually most of the calculation is unaffected. The condition that 
the action is stationary still reduces to the condition (1.34): 


0= | { (Ab o TIl) — D (dL o Tld])} n (1.176) 


At this point we argued that because the variations 7) are arbitrary 
(except for conditions at the endpoints), the only way for the 
integral to be zero is for the integrand to be zero. Furthermore, 
the freedom in our choice of 7 allowed us to deduce that the factor 
multiplying 7 in the integrand must be identically zero, thereby 
deriving Lagrange’s equations. 

Now the choice of 7 is not completely free. We may still deduce 
from the arbitrariness of 7 that the integrand must be zero,®? but 
we may no longer deduce that the factor multiplying 7 is zero 
(only that the projection of this factor onto acceptable variations 
is zero). So we have 


{ (0,L oT [q|) — D (AL oT[q])}n = 0, (1.177) 


with 7 subject to the constraints. 

A path q satisfies the constraint if |q] = y o Ffa] = 0. The 
constraint must be satisfied even for the varied path, so we only 
allow variations ņ for which the variation of the constraint is zero: 


ôn (P) = 0. (1.178) 


We can say that the variation must be “tangent” to the constraint 
surface. Expanding this with the chain rule, a variation 7 is tan- 
gent to the constraint surface if 


(iy oT |q)) n+ (32% oT [q]) Dn = 0. (1.179) 


8° Given any acceptable variation we may make another acceptable variation by 
multiplying the given one by a bump function that emphasizes any particular 
time interval. 
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Note that these are functions of time; the variation at a given time 
is tangent to the constraint at that time. 


1.10.1 Coordinate Constraints 


Consider constraints that do not depend on velocities: 

Oop = 0. 

In this case the variation is tangent to the constraint surface if 
(Ap oD) 7 =0. (1.180) 


Together, equations (1.177) and (1.180) should determine the mo- 
tion, but how do we eliminate 7? The residual of the Lagrange 
equations is orthogonal’? to any ņ that is orthogonal to the nor- 
mal to the constraint surface. A vector that is orthogonal to all 
vectors orthogonal to a given vector is parallel to the given vec- 
tor. Thus, the residual of Lagrange’s equations is parallel to the 
normal to the constraint surface; the two must be proportional: 


D (2L oT[q]) — AL o Tla] = A(1y) o Tq. (1.181) 


That the two vectors are parallel everywhere along the path does 
not guarantee that the proportionality factor is the same at each 
moment along the path, so the proportionality factor À is some 
function of time, which may depend on the path under consider- 
ation. These equations, with the constraint equation y oT [gq] = 0, 
are the governing equations. These equations are sufficient to de- 
termine the path q and to eliminate the unknown function A. 


Now watch this 
Suppose we form an augmented Lagrangian treating as one of 
the coordinates 


L'(t;q, à; å, A) = L(t,4,) + A(t, å). (1.182) 


The Lagrange equations associated with the coordinates q are just 
the modified Lagrange equations (1.181), and the Lagrange equa- 


90 We take two tuple-valued functions of time to be orthogonal if at each instant 
the dot product of the tuples is zero. Similarly, tuple-valued functions are 
considered parallel if at each moment one of the tuples is a scalar multiple of 
the other. The scalar multiplier is in general a function of time. 
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tion associated with À is just the constraint equation. (Note that 
does not appear in the augmented Lagrangian.) So the La- 
grange equations for this augmented Lagrangian fully encapsulate 
the modification to the Lagrange equations that is imposed by the 
addition of an explicit coordinate constraint, at the expense of in- 
troducing extra degrees of freedom. Notice that this Lagrangian is 
of the same form as Lagrangian (1.89) that we used in the deriva- 
tion of L = T — V for rigid systems (section 1.6.2). 


Alternatively 

How do we know that we have enough information to eliminate 
the unknown function A from equations (1.181) or that the ex- 
tra degree of freedom introduced in Lagrangian (1.182) is purely 
formal? 

If \ could be written as a function of the solution state path, 
then it would be clear that it is determined by the state and 
can thus be eliminated. Okay, suppose A can be written as a 
composition of state-dependent function with the path: \ = Ao 
I'[g]. Consider the Lagrangian 


L" = L+ Ay. (1.183) 


This new Lagrangian has no extra degrees of freedom. The La- 
grange equations for L” are the Lagrange equations for L with 
additional terms arising from the product of Ay. Applying the 
Euler-Lagrange operator E (see section 1.9) to this Lagrangian 


[L] + E[Ag] 
[L] + AE[y] +E[A] p+ DA dy + GA Dy. (1.184) 


Composition of E[L”] with I'[q] gives the Lagrange equations for 
the path q. Using the fact that the constraint is satisfied on the 
path y oT|q] = 0 and consequently Diy o Tq] = 0, we have 


E[L"] o Tlg] = (E [L] + AE [p] + DA(O2y)) o Tla], (1.185) 


°lRecall that the Euler-Lagrange operator E has the property 
E [FG] = F E[G] + E[F] G+ Di F 02G + 02F iG. 


1.10.1 Coordinate Constraints 97 


Figure 1.8 We can formulate the behavior of a pendulum as motion 
in the plane, constrained to a circle about the pivot. 


where we have used À = A oTf[q]. If we now use the fact that we 
are only dealing with coordinate constraints, 02» = 0 then 


E [L"] o P[q] = (E [L] + AEle)) o Pld). (1.186) 


The Lagrange equations are the same as those derived from the 
augmented Lagrangian L’. The difference is that now we see that 
A = A oT fq] is determined by the unaugmented state. This is the 
same as saying that A can be eliminated. 

Considering only the formal validity of the Lagrange equations 
for the augmented Lagrangian, we could not deduce that A could 
be written as the composition of a state-dependent function A with 
T[q]. The explicit Lagrange equations derived from the augmented 
Lagrangian depend on the accelerations D?q as well as \ so we 
may not deduce separately that either is the composition of a 
state-dependent function and I |q]. However, now we see that À is 
such a composition. This allows us to deduce that D?q is also a 
state-dependent function composed with the path. The evolution 
of the system is determined from the dynamical state. 


The pendulum using constraints 
The pendulum can be formulated as the motion of a massive par- 
ticle in a vertical plane subject to the constraint that the distance 
to the pivot is constant (see figure 1.8). 

In this formulation, the kinetic and potential energies in the 
Lagrangian are those of an unconstrained particle in a uniform 
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gravitational acceleration. A Lagrangian for the unconstrained 
particle is 


L(t; £, Y; Uz, Vy) = gm(v2 + vz) — mgy. (1.187) 


The constraint that the pendulum moves in a circle of radius l 
about the pivot is? 


r +y =? =0: (1.188) 
The augmented Lagrangian is 
L(t; x, y, À; Ue, Vy, A) = 5m(v2+ vg) —mgy+r(2?+y? —I?).(1.189) 


The Lagrange equations for the augmented Lagrangian are 


mD*x —2dx = 0 (1.190) 
mD?y + mg — 2Ay = 0 (1.191) 
a +y -P =O. (1.192) 


These equations are sufficient to solve for the motion of the pen- 
dulum. 

It should not be surprising that these equations simplify if we 
switch to “polar” coordinates 


x=rsind y= -—rcosð. (1.193) 


Substituting this into the constraint equation we determine that 
r = l, a constant. Forming the derivatives and substituting into 
the other two equations we find 

ml(cos 0D?8 — sin @(D6)”) — 2A sin 0 = 0 (1.194) 
ml(sin 0D?°8 + cos 0(D0)?) + mg + 2A cos 6 = 0. (1.195) 


Multiplying the first by cos 0 and the second by sin 0 and adding, 
we find 


mlD?6 + mgsin0 = 0, (1.196) 


92 This constraint has the same form as the constraints used in the demonstra- 
tion that L = T — V can be used for rigid systems. Here it is a particular 
example of a more general set of constraints. 


1.10.1 Coordinate Constraints 99 


which we recognize as the correct equation for the pendulum. This 
is the same as the Lagrange equation for the pendulum using the 
unconstrained generalized coordinate 0. For completeness, we can 
find A in terms of the other variables 


D? 1 
= T == ~ ay (mg cos 6 + ml(D@)’). (1.197) 


This confirms that A is really the composition of a function of the 
state with the state path. Notice that 2lA is a force—it is the 
sum of the outward component of the gravitational force and the 
centrifugal force. Using this interpretation in the two coordinate 
equations of motion we see that the terms involving A are the 
forces that must be applied to the unconstrained particle to make 
it move on the circle required by the constraints. Equivalently, we 
may think of 2/\ as the tension in the pendulum rod that holds 
the mass.’ 


Building systems from parts 

The method of using augmented Lagrangians to enforce con- 
straints on dynamical systems provides us with a way of building 
the analysis of a compound system by combining the results of 
the analysis of the parts of the system and the coupling between 
them. 

Consider the compound spring-mass system shown at the top of 
figure 1.9. We could analyze this as a monolithic system with two 
configuration coordinates x; and x2, representing the extensions 
of the springs from their equilibrium lengths Xj and X2. 

An alternative procedure is to break the system into several 
parts. In our spring-mass system we can choose two parts, one is 
a spring and mass attached to the wall, and the other is a spring 
and mass with its attachment point at an additional configuration 
coordinate €. We can formulate a Lagrangian for each part sepa- 
rately. We can then choose a Lagrangian for the composite system 
as the sum of the two component Lagrangians with a constraint 
€ = Xı + x1 to accomplish the coupling. 


°3Indeed, if we had scaled the constraint equations as we did in the discussion 
of Newtonian constraint forces we could have identified A with the the magni- 
tude of the constraint force F. However, though A will in general be related to 
the constraint forces it will not be one of them. We chose to leave the scaling 
as it naturally appeared rather than make things turn out artificially pretty. 
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Figure 1.9 A compound spring-mass system is decomposed into two 
subsystems. We have two springs and masses that may only move hori- 
zontally. The equilibrium positions of the springs are Xı and Xə. The 
systems are coupled by the position-coordinate constraint € = X1 + 21. 


Let’s see how this works. The Lagrangian for the subsystem 
attached to the wall is 


Iy(t, v1, 41) = 5mi47 — kiz? (1.198) 


and the Lagrangian for the subsystem that attaches to it is 


Lo(t; £, £2; €, t2) = 4mo(€ + £2)? — ikoz. (1.199) 


We construct a Lagrangian for the system composed from these 
parts as a sum of the Lagrangians for each of the separate parts, 
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with a coupling term to enforce the constraint: 


L(t; £1, £2, Ẹ, À; t1, £2, €, À) 
= Iy(t, £1, £1) + Lo(t;€, x2; Ê, £2) F AE = (Xı a r1)) (1.200) 


Thus we can write Lagrange’s equations for the four configuration 
coordinates, in order, as follows: 


m, D221 = -kızı — À 
m2(D7é + D* 2x2) = —kox 
mə(D?E + D?x2) = À 

0=€-(Xi +21) 


Notice that in this system A is the force of constraint, holding the 
system together. We can now eliminate the “glue” coordinates 
€ and A to obtain the equations of motion in the coordinates x1 
and 29: 


m,D?21 + m2(D? 24 + D? x2) +kiızı=0 (1.205) 
mə(D°xı + D? x2) + kazo = 0 (1.206) 


This strategy can be generalized. We can make a library of 
primitive components. Each component may be characterized by 
a Lagrangian with additional degrees of freedom for the terminals 
where that component may be attached to others. We then can 
construct composite Lagrangians by combining components using 
constraints to glue together the terminals. 


Exercise 1.34: Combining Lagrangians 


a. Make another primitive component that is compatible with the spring- 
mass structures described in this section. For example, make a pendu- 
lum that can attach to the spring-mass system. Build a combination 
and derive the equations of motion. Be careful, the algebra is horrible 
if you choose bad coordinates. 


b. For a nice little project, construct a family of compatible mechanical 
parts, characterized by appropriate Lagrangians, that can be combined 
in a variety of ways to make interesting mechanisms. Remember that in 
a good language the result of combining pieces should be a piece of the 
same kind that can be further combined with other pieces. 
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Exercise 1.35: Bead on a triaxial surface 


Consider again the motion of a bead constrained to move on a triaxial 
surface from exercise 1.18. Reformulate this using rectangular coordi- 
nates as the generalized coordinates with an explicit constraint that the 
bead stay on the surface. Find a Lagrangian and show that the Lagrange 
equations are equivalent to those found in exercise 1.18. 


Exercise 1.36: Motion of a tiny golf ball 


Consider the motion of a golf ball idealized as a point mass constrained 
to a frictionless smooth surface of varying height h(x, y) in a uniform 
gravitational field with acceleration g. 


a. Find an augmented Lagrangian for this system, and derive the equa- 
tions governing the motion of the point mass in x and y. 


b. Under what conditions is this approximated by a potential function 
V(x, y) = mgh(z, y)? 


c. Assume that we have an h(x, y) that is axisymmetric about x = y = 
0. Can you find such an A that yields motions with closed orbits? 


1.10.2 Derivative Constraints 


Here we investigate velocity-dependent constraints that are “to- 
tal time derivatives” of velocity independent constraints. The 
methods presented so far do not apply because the constraint is 
velocity-dependent. 

Consider a velocity-dependent constraint Y% = 0. That w is a to- 
tal time derivative means that there exists a velocity-independent 
function y such that 


Y o Tla] = Dv old). (1.207) 


That y is velocity independent means 2y = 0. As state functions 
the relationship between w and ¢ is 


Y = Dip = 9 + 3 pQ. (1.208) 


Given a w~ we can find » by solving this linear partial differential 
equation. The solution is determined up to a constant, so Y% = 0 
implies p = K for some constant K. On the other hand, if we 
knew y = K then 4 = 0 follows. Thus the velocity-dependent 
constraint ~ = 0 is equivalent to the velocity-independent con- 
straint y = K, and we know how to find Lagrange equations for 
such systems. 
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If L is a Lagrangian for the unconstrained problem, the La- 
grange equations with the constraint y = K are 


(E[Z] + A Ely]) oT'[q] = 0, (1.209) 


where A is a function of time that will be eliminated during the 
solution process. The constant K does not affect the Lagrange 
equations. The function y is velocity-independent 2% = 0, so the 
Lagrange equations become 


(E[L] — Ady yp) o Tg] = 0. (1.210) 
From equation (1.208) we see that 

Dip = dzy, (1.211) 
so the Lagrange equations with the constraint Y = 0 are 

E[L] o Tq] = Ade oT fq]. (1.212) 


The important feature is that we can write the Lagrange equations 
directly in terms of p without having to produce the integral ọ. 
Of course the validity of these Lagrange equations depends on the 
existence of the integral y. 

It turns out that the augmented Lagrangian trick also works 
here. These Lagrange equations are given if we augment the La- 
grangian with the constraint ~ multiplied by a function of time 
N: 


L=L+Ny. (1.213) 
The Lagrange equations for L’ turn out to be 
E[L] o Tq] = —DX 02% o Pq], (1.214) 


which, with the identification A = —D)’, are the same as Lagrange 
equations (1.212). 

Sometimes a problem is naturally formulated in terms of velocity- 
dependent constraints. The formalism we have developed will 
handle any velocity-dependent constraint that can be written in 
terms of the derivative of a coordinate constraint. Such a con- 
straint is called an integrable constraint. Any system for which 
the constraints can be put in the form of a coordinate constraint, 
or are already in that form, is called a holonomic system. 
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Figure 1.10 A massive hoop rolling, without slipping, down an in- 
clined plane. 


Exercise 1.37: 


Show that the augmented Lagrangian (1.213) does lead to the Lagrange 
equations (1.214), taking into account the fact that ~ is a total time 
derivative of vy. 


Goldstein’s hoop 
Here we consider a problem for which the constraint can be rep- 
resented as a time derivative of a coordinate constraint: a hoop 
of mass M rolling, without slipping, down a (one-dimensional) 
inclined plane (see figure 1.10). 

We will formulate this problem in terms of the two coordinates 
9, the rotation of an arbitrary point on the hoop from an arbitrary 
reference direction, and x, the linear progress down the inclined 
plane. The constraint is that the hoop does not slip. Thus a 
change in 0 is exactly reflected in a change in x; the constraint 
function is: 


W(t; x,0;4%,0) = RÊ — i (1.215) 


This constraint is phrased as a relation among generalized veloci- 
ties, but it could be integrated to get x = R0 + c. We may form 
our augmented Lagrangian with either the integrated constraint 
or its derivative. 


This example appears in [18] pages 49-51, 
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The kinetic energy has two parts, the energy of rotation of the 
hoop and the energy of the motion of its center of mass.2° The 
potential energy of the hoop decreases as the height decreases. 
Thus we may write the augmented Lagrangian: 


L(t; 2,0, A; &, 8, A) 
= 1MR?6? + tMi? + Mgzsing + \(R6 — 4). (1.216) 


Lagrange’s equations are 


MD*z — DÀ = Mgsiny (1.217) 
MR?D*6+ R D\=0 (1.218) 
R D0 — Dz =0. (1.219) 


And by differentiation of the third Lagrange equation we obtain, 
D*z = RD?9. (1.220) 


By combining these equations we can solve for the dynamical 
quantities of interest. For this case of a rolling hoop the linear 
acceleration 


D?z = s9siny (1.221) 


is just half of what it would have been if the mass had just slid 
down a frictionless plane without rotating. Note that for this hoop 
D?zx is independent of both M and R. We see from the Lagrange 
equations that DA can be interpreted as the friction force involved 
in enforcing the constraint. The frictional force of constraint is 


Dd = ¢Mgsing (1.222) 


and the angular acceleration is 


1 
D6 = T sin y. (1.223) 


°5We will see in chapter 2 how to compute the kinetic energy of rotation, but 
for now the answer is 4M R8? 
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1.10.3 Non-Holonomic Systems 


Systems with constraints that are not integrable are termed non- 
holonomic systems. A constraint is not integrable if it cannot be 
written in terms of an equivalent coordinate constraint. An ex- 
ample of a non-holonomic system is a ball rolling without slipping 
in a bowl. As the ball rolls it must turn so that the surface of the 
ball does not move relative to the bowl at the point of contact. 
This looks like it might establish a relation between the location of 
the ball in the bowl and the orientation of the ball, but it doesn’t. 
The ball may return to the same place in the bowl with different 
orientations depending on the intervening path the ball has taken. 
As a consequence the constraints may not be used to eliminate any 
coordinates. 

What are the equations of motion governing non-holonomic sys- 
tems? For the restricted set of systems with non-holonomic con- 
straints that are linear in the velocities, it is widely reported”® 
that the equations of motion are the following. Let ~ have the 
form 


v(t, q, v) ot Gi(t,q)v te Galt, q), (1.224) 


a state function that is linear in the velocities. We assume w is not 
a total time derivative. If L is a Lagrangian for the unconstrained 
system, then the equations of motion are asserted to be 


E[L] o Dg] = AG, o Tg] = Ado o Tq]. (1.225) 


Together with the constraint Y% = 0 the system is closed and the 
evolution of the system is determined. Note that these equations 
are identical to the Lagrange equations (1.212) for the case that w 
is a total time derivative, but here the derivation of those equations 
is no longer valid. 

An essential step in the derivation of the Lagrange equations 
for coordinate constraints p = 0 with 02y = 0 was to note that 
two conditions must be satisfied 


(E[Z] oD[q])n = 0, (1.226) 


For some treatments of non-holonomic systems see, for example, Whit- 
taker [43], Goldstein [18], Gantmakher [17], or Arnold et al. [6]. 
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and 
(ıp oT [q])n = 0. (1.227) 


Because E[L] o T[q] is orthogonal to 7, and 7 is constrained to be 
orthogonal to ıp oT |g] the two must be parallel at each moment: 


E[L] oT |g] = Adiy o Tq]. (1.228) 


The Lagrange equations for derivative constraints were derived 
from this. 

This derivation does not go through if the constraint function is 
velocity dependent. In this case, for a variation 7 to be consistent 
with the velocity-dependent constraint function ~ it must satisfy 
(see equation 1.179) 


(Ad oD lal) + (824% o P[q])Dn = 0. (1.229) 


We may no longer eliminate 7 by the same argument, because 7 
is no longer orthogonal to 0;~ oT[q], and we cannot rewrite the 
constraint as a coordinate constraint because w is, by assumption, 
not integrable. 

The following is the derivation of the non-holonomic equations 
from Arnold, et al. ([6]), translated into our notation. Define the 
“virtual velocities” € to be any velocity satisfying 


(Ooy Tq] é = 0. (1.230) 


The “principle of d’Alembert-Lagrange,” according to Arnold, 
states that 


(E[L] oD [q))é = 0, (1.231) 


for any virtual velocity €. Because € is arbitrary except that it is 
required to be orthogonal to ðôzpoT [q] and any such £ is orthogonal 
to E[L] oT |g], then 027i o T|q] must be parallel to E[L] o F|q]. So 


E[L] oP[q] = (dav o Pq), (1.232) 


which are the non-holonomic equations. 

To convert the stationary action equations to the equations of 
Arnold we must do the following. To get from equation (1.226) 
to equation (1.231), we must replace 7 by €. However, to get 
from equation (1.229) to equation (1.230), we must set 7 = 0 and 
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replace Dyn by €. All “derivations” of the non-holonomic equa- 
tions have similar identifications. It comes down to this: the non- 
holonomic equations do not follow from the action principle. They 
are something else. Whether they are correct or not depends on 
whether they agree with experiment. 

For systems with coordinate constraints or derivative constraints 
we have found that the Lagrange equations can be derived from 
a Lagrangian that is augmented with the constraint. However, if 
the constraints are not integrable the Lagrange equations for the 
augmented Lagrangian are not the same as the non-holonomic 
system (equations 1.225).°” Let L’ be an augmented Lagrangian 
with non-integrable constraint w: 


L'(t;q, A; å, å) = L(t, q, 4) + r¥(t, q, å) (1.233) 
then the Lagrange equations associated with the coordinates are: 


0 = E[L] o T fq] 
+ DA(ð2Y) o T|] + AD((824) o Pal) — A31) o Fla]. (1.234) 


The Lagrange equation associated with A is just the constraint 
equation 


YoTjq] =0. (1.235) 


An interesting feature of these equations is that they involve both 
Aà and DA. Thus the usual state variables q and Dq, with the 
constraint, are not sufficient to determine a full set of initial con- 
ditions for the derived Lagrange equations, we need to specify an 
initial value for À as well. 

In general, for any particular physical system, equations (1.225) 
and (1.234) are not the same, and in fact they have different so- 
lutions. It is not apparent that either set of equations accurately 
models the physical system. The first approach to non-holonomic 
systems is not justified by extension of the arguments for the holo- 
nomic case and the other is not fully determined. Perhaps this is 
an indication that the models are inadequate; that more details 
of how the constraints are maintained need to be specified. 


97 Arnold, et al. [6] call the variational mechanics with the constraints added 
to the Lagrangian Vakonomic mechanics. 
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1.11 Summary 


To analyze a mechanical system we construct an action function 
that gives us a way to distinguish realizable motions from other 
conceivable motions of the system. The action function is con- 
structed so as to be stationary only on paths describing realizable 
motions, with respect to variations of the path. This is called the 
principle of stationary action. The principle of stationary action 
is a coordinate-independent specification of the realizable paths. 
For systems with or without constraints we may choose any sys- 
tem of coordinates that uniquely determines the configuration of 
the system. 

For a large variety of mechanical systems actions are integrals 
of a function, called the Lagrangian, along the path. For many 
systems an appropriate Lagrangian is the difference of the kinetic 
energy and the potential energy of the system. The choice of a 
Lagrangian for a system is not unique. 

For any system that we have a Lagrangian action we can for- 
mulate a system of ordinary differential equations, the Lagrange 
equations, that is satisfied by any realizable path. The method of 
deriving the Lagrange equations from the Lagrangian is indepen- 
dent of the coordinate system used to formulate the Lagrangian. 
One freedom we have in formulation is that the addition of a to- 
tal time derivative to a Lagrangian for a system yields another 
Lagrangian that has the same Lagrange equations. 

The Lagrange equations are a set of ordinary differential equa- 
tions: there is a finite state that summarizes the history of the 
system and is sufficient to determine the future. There is an ef- 
fective procedure for evolving the motion of the system from a 
state at an instant. For many systems the state is determined by 
the coordinates and the rate of change of the coordinates at an 
instant. 

If there are continuous symmetries in a physical system there 
are conserved quantities associated with them. If the system can 
be formulated in such a way that the symmetries are manifest in 
missing coordinates in the Lagrangian then there are conserved 
momenta conjugate to those coordinates. If the Lagrangian is 
independent of time then there is a conserved energy. 


110 Chapter 1 Lagrangian Mechanics 


1.12 Projects 


Exercise 1.38: A numerical investigation 


Consider a pendulum: a mass m supported on a massless rod of length 
l, in a uniform gravitational field. A Lagrangian for the pendulum is: 


L(t, 0,6) = 5 (18)? + mgl cos 0 


For the pendulum, the period of the motion depends on the amplitude. 
We wish to find trajectories of the pendulum with a given frequency. 
Three methods of doing this present themselves: (1) solution by the 
principle of least action, (2) numerical integration of Lagrange’s equa- 
tion, and (3) analytic solution (which requires some exposure to elliptic 
functions). We will carry out all three, and compare the solution trajec- 
tories. 

To be specific, consider the parameters m = 1kg, l = 1m, g = 
9.8ms~?. The frequency of small amplitude oscillations is wọ = y g/l. 
Let’s find the non-trivial solution that has the frequency wı = fwo. 


a. The angle is periodic in time, so a Fourier series representation is 
appropriate. We can choose the origin of time so that a zero crossing 
of the angle is at time zero. Since the potential is even in the angle, 
the angle is an odd function of time. Thus we need only a sine series. 
Since the angle returns to zero after one-half period the angle is an odd 
function of time about the midpoint. Thus only odd terms of the series 
are present: 


m 
A(t) = X An sin((2n — 1)wyt). 
n=1 
The amplitude of the trajectory is A = Omax = Xp (1)? Tt An. 

Find approximations to the first few coefficients A, by minimizing 
the action. You will have to write a program similar to the find-path 
procedure in section 1.4. Watch out: there is more than one trajectory 
that minimizes the action. 


b. Write a program to numerically integrate Lagrange’s equations for 
the trajectories of the pendulum. The trouble with using numerical 
integration to solve this problem is that we do not know how the fre- 
quency of the motion depends on the initial conditions. So we have to 
guess, and then gradually improve our guess. Define a function (0) 
that numerically computes the frequency of the motion as a function of 
the initial angular velocity (with @ = 0). Find the trajectory by solving 
Q(6) = w, for the initial angular velocity of the desired trajectory. Meth- 
ods of solving this equation include successive bisection, minimizing the 
squared residual, etc.—choose one. 
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Figure 1.11 The double pendulum is pinned in two joints so that its 
members are free to move in a plane. 


c. Now let’s formulate the analytic solution for the frequency as a func- 
tion of amplitude. The period of the motion is simply 


r=4f a4 f do. 
0 o 0 


Using the energy, solve for Ê in terms of the amplitude A and @ to write 
the required integral explicitly. This integral can be written in terms 
of elliptic functions, but in a sense this does not solve the problem—we 
still have to compute the elliptic functions. Let’s avoid this excursion 
into elliptic functions and just do the integral numerically using the 
procedure definite-integral. We still have the problem that we can 
specify the amplitude A and get the frequency but to solve our problem 
we need to solve the inverse problem, but that can be done as in part b. 


Exercise 1.39: Double pendulum behavior 

Consider the ideal double pendulum show in figure 1.11. 

a. Formulate a Lagrangian to describe the dynamics. Derive the equa- 
tions of motion in terms of the given angles 4; and #2. Put the equations 


into a form appropriate for numerical integration. 
Assume the following system parameters: 


g=98 — 
sec 


1; =1.0m 
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l2 = 0.9m 


b. Prepare graphs showing the behavior of each angle as a function of 
time when the system is started with the initial conditions: 


6,(0) = Z radian 


2 
02(0) = 7 radian 
6b, (0) =0 radian 
sec 
! di 
(0) = 0 radian 
sec 


Make the graphs extend to 50 seconds. Save the state points at .125 
second intervals in a list. 


c. Make a graph of the behavior of the energy of your system as a 
function of time. The energy should be conserved. How good is the 
conservation you obtained? 


d. Repeat the experiment of part b with the mz bob 107!° m higher 
than before. Form the list of squared differences of the distances between 
the mə bobs in the two experiments, and plot the log of that against 
time. What do you see? 


e. Repeat the previous comparison, but this time with the initial con- 
ditions: 


6, (0) = 5 radian 
62(0) = 0 radian 


di 

Åi (0) a 0 radian 
Sec 

: di 

b> (0) = 0 radian 
Sec 


What do you see here? 


2 
Rigid Bodies 


The polhode rolls without slipping on the 
herpolhode lying in the invariable plane. 


Herbert Goldstein Classical Mechanics, (1950), 
footnote on p 161. 


The motion of rigid bodies presents many surprising phenomena. 

Consider the motion of a top. A top is usually thought of as 
an axisymmetric body, subject to gravity, with a point on the 
axis of symmetry that is fixed in space. The top is spun, and in 
general executes some complicated motion. We observe that the 
top usually settles down into an unusual motion in which the axis 
of the top slowly precesses about the vertical, apparently moving 
perpendicular to the direction in which gravity is attempting to 
accelerate it. 

Consider the motion of a book thrown into the air.! Books 
have three main axes. Idealized as a brick with rectangular faces, 
the three axes are the lines through the centers of opposite faces. 
Try spinning the book about each axis. The motion of the book 
spun about the longest and the shortest axis is a simple regular 
rotation, perhaps with a little wobble depending on how carefully 
it is thrown. The motion of the book spun about the intermediate 
axis is qualitatively different: however carefully the book is spun 
about the intermediate axis the book tumbles. 

The rotation of the Moon is peculiar in that the Moon always 
presents the same face to the Earth, indicating that the rotational 
period and the orbit period are the same. Considering that the 
orbit of the Moon is constantly changing because of interactions 
with the Sun and other planets, and therefore the orbital period 
of the Moon is constantly undergoing small variations, we might 
expect that the face of the Moon that we see would slowly change, 
but it does not. What is special about the face that is presented 
to us? 


lWe put a rubber band around the book so that it does not open. 
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A rigid body may be thought of as a large number of constituent 
particles with rigid constraints among them. Thus the dynamical 
principles governing the motion of rigid bodies are the same as 
those governing the motion of any other system of particles with 
rigid constraints. What is new here is that the number of con- 
stituent particles is very large and we need to develop new tools 
to handle them effectively. 

We have found that a Lagrangian for a system with rigid con- 
straints can be written as the difference of the kinetic and po- 
tential energies. The kinetic and potential energies are naturally 
expressed in terms of the positions and velocities of the constituent 
particles. To write the Lagrangian in terms of the generalized co- 
ordinates and velocities we must specify functions that relate the 
generalized coordinates to the positions of the constituent parti- 
cles. In the systems with rigid constraints considered up to now 
these functions were explicitly given for each of the constituent 
particles and individually included in the derivation of the La- 
grangian. For a rigid body there are too many consituent particles 
to handle each one of them in this way. We need to find means 
of expressing the kinetic and potential energies of rigid bodies in 
terms of the generalized coordinates and velocities, without going 
through the particle-by-particle details. 

The strategy is to first rewrite the kinetic and potential energies 
in terms of quantities that characterize essential aspects of the 
distribution of mass in the body and the state of motion of the 
body. Only later do we introduce generalized coordinates. For 
the kinetic energy, it turns out a small number of parameters 
completely specify the state of motion and the relevant aspects 
of the distribution of mass in the body. For the potential energy, 
we find that for some specific problems the potential energy can 
be represented with a small number of parameters, but in general 
we have to make approximations to obtain a representation with 
a manageable number of parameters. 


2.1 Rotational Kinetic Energy 


We consider a rigid body to be made up of a large number of 
constituent particles with mass mag, position Ze, and velocities 
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La, with rigid positional constraints among them. The kinetic 
energy is 


S Mala Ep (2.1) 
Q 


It turns out that the kinetic energy of a rigid body can be sepa- 
rated into two pieces: a kinetic energy of translation and a kinetic 
energy of rotation. Let’s see how this comes about. 

The configuration of a rigid body is fully specified given the 
location of any point in the body and the orientation of the body. 
This suggests that it would be useful to decompose the position 
vectors for the constituent particles as the sum of the vector X 
to some reference position in the body and the vector £a from the 
reference position to the particular constituent element with index 
a: 


Za =X +é. (2.2) 
Along paths, the velocities are related by 
Ta = X T fis (2.3) 


So in terms of X and Ë, the kinetic energy is 


D O (2.4) 
Q 


If we select the reference position in the body to be its center of 
mass, 


En 1 Y 
X= a 2 Mafa, (2.5) 
where M = >, Ma is the total mass of the body, then 


S maka = Y Ma(Fa — X) = 0. (2.6) 


116 Chapter 2 Rigid Bodies 


So along paths the relative velocities satisfy 
Q 

The kinetic energy is then 

X ima X - X+ Vy Mota a (2.8) 
Q 


The kinetic energy is the sum of the kinetic energy of the motion 
of the total mass at the center of mass 


IMX -X, (2.9) 


and the kinetic energy of rotation about the center of mass 
`> mafa Bare (2.10) 
a 


Written in terms of appropriate generalized coordinates the ki- 
netic energy is a Lagrangian for a free rigid body. If we choose 
generalized coordinates so that the center of mass position is en- 
tirely specified by some of them and the orientation is entirely 
specified by others, then the Lagrange equations for a free rigid 
body will decouple into two groups of equations, one concerned 
with the motion of the center of mass and one concerned with the 
orientation. 

Such a separation might occur in other problems, such as a 
rigid body moving in a uniform gravitational field, but in general, 
potential energies cannot be separated as the kinetic energy sep- 
arates. So the motion of the center of mass and the rotational 
motion are usually coupled through the potential. Even in these 
cases, it is usually an advantage to choose generalized coordinates 
that separately specify the position of the center of mass and the 
orientation. 


2.2 Kinematics of Rotation 


The motion of a rigid body about a center of rotation, a reference 
position that is fixed with respect to the body, is characterized 


2.2 Kinematics of Rotation 117 


at each moment by a rotation axis and a rate of rotation. Let’s 
elaborate. 

We can get from any orientation of a body to any other orien- 
tation of the body by a rotation of the body. That this is true is 
called Euler’s theorem.? We know that rotations have the prop- 
erty that they do not commute: the composition of successive 
rotations in general depends on the order of operation. Rotating 
a book about the ĉ axis and then about the Zz axis puts the book 
in a different orientation than rotating the book about the Z axis 
and then about the ĉ axis. Nevertheless, Euler’s theorem states 
that however many rotations have been composed to reach a given 
orientation, the orientation could have been reached with a single 
rotation. Try it! We take a book, rotate it this way, then that, 
and then some other way—then find the rotation that does the job 
in one step. So a rotation can be specified by an axis of rotation 
and the angular amount of the rotation. 

If the orientation of a body evolves over some interval of time 
then the orientation at the beginning and the end of the interval 
can be connected by a single rotation. In the limit that the du- 
ration of the interval goes to zero the rotation axis approaches a 
unique instantaneous rotation axis. And in this limit the ratio of 
the angle of rotation and the duration of the interval approaches 
the instantaneous rate of rotation. We represent this instanta- 
neous rotational motion by the angular velocity vector œ, which 
points in the direction of the rotation axis (with the right-hand 
rule giving the direction of rotation about the axis) and has a 
magnitude equal to the rate of rotation. 

If the angular velocity vector for a body is © then the velocities 
of the constituent particles are perpendicular to the vectors to 
the constituent particles and proportional to the rate of rotation 
of the body and the distance of the constituent particle from the 
instantaneous rotation axis: 


fee. (2.11) 


Isn’t it interesting that we have found a concise way of specify- 
ing how the orientation of the body is changing, even though we 
have not yet described a way to specify the orientation itself. 


For an elementary geometric proof of Euler’s theorem see Whittaker [43]. 
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2.3 Moments of Inertia 


The rotational kinetic energy is the sum of the kinetic energy of 
each of the constituents of the rigid body. We can rewrite the 
rotational kinetic energy in terms of the angular velocity vector 
and certain aggregate quantities determined by the distribution 
of mass in the rigid body. 

Substituting our representation of the relative velocity vectors 
into the rotational kinetic energy we obtain 


`> Imat, Ey = Za fma (@ x Ea) «(Gx &). (2.12) 


We introduce an arbitrary rectangular coordinate system with ori- 
gin at the center of rotation and with basis vectors ĉo, €1, and êz, 
with the property that ĉo x ê = ê2. The components of œ on this 
coordinate system are w, w!, and w?. Rewriting @ in terms of its 
components, the rotational kinetic energy becomes 


So ima (Zi ei") x E) - (Hy êw) x Ea) 


= 5 ij ww! a Ma (êi x Ea) : (ê; x Ea) 
= 5 Dij ww Tig, (2.13) 


with 


Ee Se te, Gx eae Ea): (2.14) 


The quantities [;; are the components of the inertia tensor with 
respect to the chosen coordinate system. Note what a remarkable 
form the kinetic energy has taken. All we have done is interchange 
the order of summations, but now the kinetic energy is written as 
a sum of products of components of the angular velocity vector, 
which completely specify how the orientation of the body is chang- 
ing, and the quantity I;;, which depends solely on the distribution 
of mass in the body relative to the chosen coordinate system. 
We will deduce a number of properties of the inertia tensor. 
First, we find a somewhat simpler expression for it. The compo- 
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nents of the vector & are (fa, Na; Ca)? Rewriting Ë, as a sum 
over its components, and simplifying the elementary vector prod- 
ucts of basis vectors, the components of the inertia tensor can be 
arranged in the inertia matriz I, which looks like: 


Se Ma (n2 ay (2) = Na Mabaa m Fa Mataba 
z Doa Manaa Xa Malé + Ca) 7 Ja MaNaSa (2.15) 
> Xa Maaa 5 ae Malana ae Malé? F na) 


The inertia tensor has real components and is symmetric: Ij, = 
Iki. 
We define the moment of inertia I about a line by 


1= F malk}, (2.16) 


where €+ is the perpendicular distance from the line to the con- 
stituent with index a. The diagonal components of the inertia 
tensor J;; are recognized as the moments of inertia about the lines 
coinciding with the coordinate axes ê. The off-diagonal compo- 
nents of the inertia tensor are called products of inertia. 

The rotational kinetic energy of a body depends on the distri- 
bution of mass of the body solely through the inertia tensor. Re- 
markably, the inertia tensor involves only second order moments 
of the mass distribution with respect to the center of mass. We 
might have expected the kinetic energy to depend in a complicated 
way on all the moments of the mass distribution, interwoven in 
some complicated way with the components of the angular ve- 
locity vector, but this is not the case. This fact has a remarkable 
consequence: for the motion of a free rigid body the detailed shape 
of the body does not matter. If a book and a banana have the 
same inertia tensor, that is, the same second order mass moments, 
then if they are thrown in the same way the subsequent motion 
will be the same, however complicated that motion is. The fact 
that the book has corners and the banana has a stem do not affect 
the motion except for their contributions to the inertia tensor. In 
general, the potential energy of an extended body is not so simple 


3Here we avoid the more consistent notation (€2, €4,€2) for the components 


of E because it is awkward to write expressions involving powers of the com- 
ponents written this way. 
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and does indeed depend on all moments of the mass distribution, 
but for the kinetic energy the second moments are all that matter! 


Exercise 2.1: Rotational kinetic energy 
An interesting alternate form for the rotational kinetic energy can be 
found by decomposing a into components parallel and perpendicular 


to the rotation axis w. Show that the rotational kinetic energy can also 
be written 


Tr = HoP, (2.17) 


where J is the moment of inertia about the line through the center of 
mass with direction w, and w is the instantaneous rate of rotation. 


Exercise 2.2: Steiner’s theorem 


Let I be the moment of inertia of a body with respect to some given line 
through the center of mass. Show that the moment of inertia I’ with 
respect to a second line parallel to the first is 


=I+MR (2.18) 


where M is the total mass of the body and R is the distance between 
the lines. 


Exercise 2.3: Some useful moments of inertia 
Show that the moments of inertia of the following objects are as given: 


a. The moment of inertia of a sphere of uniform density with mass M 
and radius R about any line through the center is 2M R?. 


b. The moment of inertia of a spherical shell with mass M and radius 
R about any line through the center is ¿M R?. 


c. The moment of inertia of a cylinder of uniform density with mass M 
and radius R about the axis of the cylinder is 3M R2. 


c. The moment of inertia of a thin rod of uniform density per unit 
length with mass M and length L about an axis perpendicular to the 
rod through the center of mass is 4 ML?. 


Exercise 2.4: Jupiter 


a. The density of a planet increases toward the center. Provide an 
argument that the moment of inertia is less than that of a sphere of 
uniform density of the same mass and radius. 


b. The density as a function of radius inside Jupiter is well approxi- 
mated by 
_ M sin(rr/R) 


p(r) = R3 TAIR 
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where M is the mass and R is the radius of Jupiter. Find the moment 
of inertia of Jupiter in terms of M and R. 


2.4 Inertia Tensor 


The representation of the rotational kinetic energy in terms of the 
inertia tensor was derived with the help of a rectangular coordi- 
nate system with basis vectors €;. There was nothing special about 
this particular rectangular basis. So, the kinetic energy must have 
the same form in any rectangular coordinate system. We can use 
this fact to derive how the inertia tensor changes if the body or 
the coordinate system is rotated. 

Let’s talk a bit about active and passive rotations. The rotation 
of the vector Z by the rotation R produces a new vector Z = RZ. 
We may write 7 in terms of its components with respect to some 
arbitrary rectangular coordinate system with orthonormal basis 
vectors ê; Z = 9 + v!é, + x7é9. Let x indicate the column 
matrix of components 7°, xt, and x? of z, and R be the matrix 
representation of R with respect to the same basis. In these terms 
rotation can be written x’ = Rx. The rotation matrix R is a real 
orthogonal matrix. A rotation that carries vectors to new vectors 
is called an active rotation. 

Alternately, we can rotate the coordinate system by rotating the 
basis vectors, but leave other vectors that might be represented 
in terms of them unchanged. If a vector is unchanged but the 
basis vectors are rotated then the components of the vector on 
the rotated basis vectors are not the same as the components 
on the original basis vectors. Denote the rotated basis vectors 
by ê = Rê; The component of a vector along a basis vector 
is the dot product of the vector with the basis vector. So the 
components of the vector Z along the rotated basis é, are (z’)’ = 
£-é, = Z. (Rêi) = (R12) -é;.° So the components with respect to 
the rotated basis elements are the same as the components of the 
rotated vector R~'Z with respect to the original basis. In terms 
of components, if the vector has components x with respect to 
the original basis vectors é;, then the components x’ of the same 


tAn orthogonal matrix R satisfies R” = R`! and det R = 1. 


>The last equality follows from the fact that the rotation of two vectors pre- 
serves the dot product: Z- y = (RZ) - (Ry), or (RIZ) -g = Z- (Ry). 
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vector with respect to the rotated basis vectors ê; are x’ = R7!x, 
or equivalently x = Rx’. A rotation that actively rotates the 
basis vectors, leaving other vectors unchanged, is called a passive 
rotation. For a passive rotation the components of a fixed vector 
change as if the vector was actively rotated by the inverse rotation. 

With respect to the rectangular basis ê; the rotational kinetic 
energy is written 


5 aig ww Ly. (2.19) 
In terms of matrix representations, the kinetic energy is 


iw Tu, (2.20) 


where w is the column of components representing @.° If we rotate 
the coordinate system by the passive rotation R about the center 
of rotation, the new basis vectors are ê; = Ré;. The components 
w’ of the vector @ with respect to the rotated coordinate system 
satisfy 


w = Ro’ (2.21) 
where R is the matrix representation of R. The kinetic energy is 
E(w) RITRw!. (2.22) 


However, if we had started with the basis ê, we would have written 
the kinetic energy directly as 


E(w Tw’, (2.23) 


where the components are taken with respect to the ê; basis. Com- 
paring the two expressions, we see that 


T = RIR. (2.24) 


Thus the inertia matrix transforms by a similarity transforma- 
tion.” 


®We take a 1-by-1 matrix as a number. 


"That the inertia tensor transforms in this manner could have been deduced 
from its definition (2.14). However, it seems that this argument, based on the 
coordinate-system independence of the kinetic energy, provides insight. 
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2.5 Principal Moments of Inertia 


We can use the transformation properties of the inertia ten- 
sor (2.24) to show that there are special rectangular coordinate 
systems for which the inertia tensor I’ is diagonal, that is, Ii; = 0 
for i Æ j. Let’s assume that I’ is diagonal and solve for the rota- 
tion matrix R that does the job. Multiplying both sides of (2.24) 
on the left by R we have 


RI’ =IR. (2.25) 


We can examine pieces of this matrix equation by multiplying on 
the right by a trivial column vector that picks out a particular 
column. So we multiply on the right by the column matrix rep- 
resentation e; of each of the coordinate unit vectors é;. These 
column matrices have a one in the i*” row and zeroes otherwise. 
Using e; = Re;, we find 


RTe; = IRe; = Ie’. (2.26) 
The matrix I’ is diagonal so 


RT'e; = Re;,I/, = I/,e; (2.27) 


iiSi: 


So, from equations (2.26) and (2.27), we have 


Lie; = Ie;, (2.28) 
which we recognize as an equation for the eigenvalue Jj, and ef, 
the column matrix of components of the associated eigenvector. 

From e; = Re;, we see that the e; are the columns of the 
rotation matrix R. Now, rotation matrices are orthogonal, so 
R'R = 1; thus the columns of the rotation matrix must be or- 
thonormal (e;)"e} = 6;;, which is one if i = j and zero otherwise. 
But the eigenvectors that are solutions of equation (2.28) are not 
necessarily even orthogonal. So we are not done yet. 

If a matrix is real and symmetric then the eigenvalues are real. 
Furthermore, if the eigenvalues are distinct then the eigenvectors 
are orthogonal. However, if the eigenvalues are not distinct then 
the directions of the eigenvectors for the degenerate eigenvalues 
are not uniquely determined—we have the freedom to choose par- 
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ticular e; that are orthogonal. The linearity of equation (2.28) 
implies the e; can be normalized. Thus whether or not the eigen- 
values are distinct we can obtain an orthonormal set of e;. This 
is enough to reconstruct a rotation matrix R that does the job 
we asked of it: to rotate the coordinate system to a configuration 
such that the inertia tensor is diagonal. If the eigenvalues are not 
distinct, the rotation matrix R is not uniquely defined—there is 
more than one rotation matrix R that does the job. 

The eigenvectors and eigenvalues are determined by the require- 
ment that the inertia tensor be diagonal with respect to the ro- 
tated coordinate system. Thus the rotated coordinate system has 
a special orientation with respect to the body. The basis vectors 
ê: therefore actually point along particular directions in the body. 
We define the axes in the body through the center of mass with 
these directions to be the principal axes. With respect to the co- 
ordinate system defined by é the inertia tensor is diagonal, by 
construction, with the eigenvalues I’; on the diagonal. Thus the 
moments of inertia about the principal axes are the eigenvalues 
I; We call the moments of inertia about the principal axes the 
principal moments of inertia. 

For convenience, we often label the principal moments of inertia 
according to their size: A < B < C, with principal axis unit vec- 
tors a, b, €, respectively. The positive direction along the principal 
axes can be chosen so that â, b, ¢ form a right handed rectangular 
coordinate basis. 

Let x represent the matrix of components of a vector % with 
respect to the basis vectors é;. Recall that the components x’ of a 
vector Z with respect to the principal axis unit vectors ê; satisfy 


x’ =R'x. (2.29) 


This makes sense because the columns of R are the components 
of e. Multiplying the components of 7 by the transpose of R is 
taking the dot product of each ê: with # producing the compo- 
nents. The components of a vector on the principal axis basis are 
sometimes called the body components of the vector. 


SIf two eigenvalues are not distinct then linear combinations of the associ- 
ated eigenvectors are eigenvectors. This gives us the freedom to find linear 
combinations of the eigenvectors that are orthonormal. 
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Now let’s rewrite the kinetic energy in terms of the principal mo- 
ments of inertia. If we choose our rectangular coordinate system 
so that it coincides with the principal axes then the calculation 
is simple. Let the components of the angular velocity vector on 
the principal axes be (w%,w?,w°). Then, keeping in mind that the 
inertia tensor is diagonal with respect to the principal axis basis, 
the kinetic energy is just 


Tr = 4 [A(w*)? + Blut)? + C(w*)’]. (2.30) 


Exercise 2.5: A constraint on the moments of inertia 


Show that the sum of any two of the moments of inertia is greater than 
the third moment of inertia. 


Exercise 2.6: Principal moments of inertia 


For each of the configurations described below find the principal mo- 
ments of inertia with respect to the center of mass; find the correspond- 
ing principal axes. 


a. A regular tetrahedron consisting of four equal point masses tied to- 
gether with rigid massless wire. 


b. A cube of uniform density. 


c. Five equal point masses rigidly connected by massless stuff. The 
point masses are at the rectangular coordinates: 


(—1,0,0), (1,0, 0), (1, 1,0), (0,0, 0), (0,0, 1) 


Exercise 2.7: This book 


Measure this book. You will admit that it is pretty dense. Don’t worry, 
you will get to throw it later. Show that the principal axes are the lines 
connecting the centers of opposite faces of the idealized brick approx- 
imating the book. Compute the corresponding principal moments of 
inertia. 


2.6 Representation of the Angular Velocity Vector 


We can specify the orientation of a body by specifying the rotation 
that takes the body to this orientation from some reference ori- 
entation. As the body moves the rotation that does this changes. 
The angular velocity vector can be written in terms of this chang- 
ing rotation along a path. 

Let q be the coordinate path that we will use to describe the 
motion of the body. Let M(q(t)) be the rotation that takes the 
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i j 


M(q(t)) 
q 
l f 


Figure 2.1 The rotation M(q(t)) rotates the body from a reference 
orientation in which the principal axes are aligned with the basis ê; 
(labeled by x, y, and z here) to the orientation specified by q(t). 


body from the reference orientation to the orientation specified by 
q(t) (see figure 2.1). Let a(t) be the vector to some constituent 
with the body in the orientation specified by q(t), and let Ë, be 
the vector to the same constituent with the body in the reference 
orientation. Then 


Ealt) = M(q(t))&.- (2.31) 


The constituent vectors Ë, do not depend on the configuration, 
because they are the vectors to the positions of the constituents 
with the body in a fixed reference orientation. 

We have already found an expression for the kinetic energy in 
terms of the angular velocity vector and the inertia tensor. Here 
we do this a different way. To compute the kinetic energy we 
accumulate the contributions from all of the mass elements. The 
positions of the constituent particles, at a given time t, are 


ECE) = M(q)Z, = MHA, (2.32) 
where M = Moq. The velocity is the time derivative 

DE = DM (dé, (2.33) 
Using equation (2.32) we can write 


Dé&(t) = DM (t)(M(t)) Ealt). (2.34) 
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Recall that the velocity results from a rotation, and that the ve- 
locities are (see equation 2.11) 


DE, (t) = a(t) x Eq (t). (2.35) 


Thus we can identify the operator @(t)x with DM(t)(M(t))~! To 
form the kinetic energy we need to extract W(t) from this. 

If a vector w@ is represented by the component matrix u with 
components x, y, and z, the function A which produces the matrix 
representation of ux from the component matrix u is 


0 =z 
A(u) = | z 0 J : (2.36) 


The inverse of this function can be applied to any skew-symmetric 
matrix, and so we can use AT! to extract the components w of 
the angular velocity vector from the matrix representation of wx 
in terms of M: 


w = A~'(DM M’), (2.37) 


where M and DM are the matrix representations of the functions 
M and DM, and where we have used the fact that for a matrix 
representation of a rotation the transpose gives the inverse. 

The components w’ of the angular velocity vector on the prin- 
cipal axes are: w’ = Mw. So 


w = M'A! DM M’). (2.38) 


The relationship of the angular velocity vector to the path is 
a kinematic relationship; it is valid for any path. Thus we can 
abstract it to obtain the components of the angular velocity at a 
moment given the configuration and velocity at that moment. 


Implementation of angular velocity functions 
The following procedure gives the components of the angular ve- 
locity as a function of time along the path 
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(define (((M-of-q->omega-of-t M-of-q) q) t) 
(define M-on-path (compose M-of-q q)) 
(define (omega-cross t) 
(* ((D M-on-path) t) 
(m:transpose (M-on-path t)))) 
(antisymmetric->column-matrix (omega-cross t))) 


The procedure omega-cross produces the matrix representation of 
wx. The procedure antisymmetric->column-matrix, which cor- 
responds to the function A~!, is used to extract the components of 
the angular velocity vector from the skew-symmetric ŭx matrix. 

The body components of the angular velocity vector as a func- 
tion of time along the path are 


(define (((M-of-q->omega-body-of-t M-of-q) q) t) 
(* (m:transpose (M-of-q (q t))) 
(((M-of-q->omega-of-t M-of-q) q) t))) 


We can get the procedures of local state that give the angu- 
lar velocity components by abstracting these procedures along ar- 
bitrary paths that have given coordinates and velocities. The 
abstraction of a procedure of a path to a procedure of state is 
accomplished by Gamma-bar (see section 1.6.1): 


(define (M->omega M-of-q) 
(Gamma-bar 
(M-of-q->omega-of-t M-of-q))) 


(define (M->omega-body M-of-q) 
(Gamma-bar 
(M-of-q->omega-body-of-t M-of-q))) 


These procedures give the angular velocities as a function of state. 
We will see them in action after we get some M-of-q’s with which 
to work. 


2.7 Euler Angles 


To go further we must finally specify a set of generalized coordi- 
nates. We first do this using the traditional Euler angles. Later, 
we find other ways of describing the orientation of a rigid body. 
We are using an intermediate representation of the orientation 
in terms of the function M of the generalized coordinates that gives 
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the rotation that takes the body from some reference orientation 
and rotates it to the orientation specified by the generalized coor- 
dinates. Here we take the reference orientation so that principal- 
axis unit vectors a, b, ĉ are coincident with the basis vectors ê; 
labeled here by ĉ, ĝ, 2. 

We define the Euler angles in terms of simple rotations about 
the coordinate axes. Let R,(q) be a right-handed rotation about 
the ĉ axis by the angle 7, and let R- (Y) be a right-handed rotation 
about the Z axis by the angle w. The function M for Euler angles 
is written as a composition of three of these simple coordinate axis 
rotations: 


M(0, p, p) a R.(p) Rx (A) Rz(y), (2.39) 


for the Euler angles 0, p, w. 

The Euler angles can specify any orientation of the body, but 
the orientation does not always correspond to a unique set of Eu- 
ler angles. In particular, if 0 = 0 then the orientation is dependent 
only on the sum y+ y, so the orientation does not uniquely de- 
termine either y or w. 


Exercise 2.8: Euler angles 


It is not immediately obvious that all orientations can be represented in 
terms of the Euler angles. To show that Euler angles are adequate to 
represent all orientations solve for Euler angles that give an arbitrary 
rotation R. Keep in mind that some orientations do not correspond to 
a unique representation in terms of Euler angles. 


Though the Euler angles allow us to specify all orientations and 
thus can be used as generalized coordinates, the definition of Euler 
angles is pretty arbitrary. In fact no reasoning has led us to them, 
and this is reflected in our presentation of them by just saying 
“here they are.” Euler angles are well suited for some problems 
and are cumbersome for others. 

There are other ways of defining similar sets of angles. For 
instance, we could also take our generalized coordinates to satisfy 


M'(0, 9, Y) = Rr(p) Ry (0) Rz(). (2.40) 


Such alternatives to the Euler angles come in handy from time to 
time. 
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Each of the fundamental rotations can be represented as a ma- 
trix. The rotation matrix representing a right-handed rotation 
about the 2 axis by the angle w is 


[sine — siny | 


(2.41) 


R.(w) =| siny cosy 0 


0 0 1 


and a right-handed rotation about the x axis by the angle w is 
represented by the matrix 


1 0 0 
R,(w) = fo cos Y -sng | ; (2.42) 
0 siny cosw 


The matrix that represents the rotation that carries the body from 
its reference orientation to the actual orientation is 


Rz(y)R2()Rz(v). (2.43) 


The rotation matrices and their product can be constructed by 
simple programs: 


(define (rotate-z-matrix angle) 
(matrix-by-rows 


(list (cos angle) (- (sin angle)) 0) 
(list (sin angle) (cos angle) 0) 
(list 0 0 1))) 


(define (rotate-x-matrix angle) 
(matrix-by-rows 


(list 1 (0) 0) 
(list 0 (cos angle) (- (sin angle))) 
(list 0 (sin angle) (cos angle)))) 


(define (Euler->M angles) 
(let ((theta (ref angles 0)) 
(phi (ref angles 1)) 
(psi (ref angles 2))) 
(* (rotate-z-matrix phi) 
(rotate-x-matrix theta) 
(rotate-z-matrix psi)))) 


Now that we have a procedure that implements a sample M, 
we can find the components of the angular velocity vector and the 
body components of the angular velocity vector using the proce- 
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dures M-of-q->omega-of-t and M-of-q->omega-body-of-t from 
section 2.6. For example, 


(show-expression 
(((M-of-q->omega-body-of-t Euler->M) 
(up (literal-function ’theta) 
(literal-function ’phi) 
(literal-function ’psi))) 


’t)) 


Do (t) sin (6 (t)) sin (w (t)) + cos (% (t)) DO (t) l 
j (t) sin (0 (t)) cos (4 (t)) — sin (4 (t)) DO "| 


cos (0 (t)) Dy (t) + Dy (t) 


To construct the kinetic energy we need the procedure of state 
that gives the body components of the angular velocity vector: 


(show-expression 
((M->omega-body Euler->M) 
(up ?t 
(up ’theta ’phi ’psi) 
(up ’thetadot ’phidot ’psidot)))) 


iia (y) sin (0) + 8 cos (2) | 
ġsin (0) cos (Y) — sin (4) 


ġcos (0) +) | 


We capture this result as a procedure: 


(define (Euler-state->omega-body local) 
(let ((q (coordinate local)) (qdot (velocity local))) 
(let ((theta (ref q 0)) 
(psi (ref q 2)) 
(thetadot (ref qdot 0)) 
(phidot (ref qdot 1)) 
(psidot (ref qdot 2))) 
(let ((Comega-a (+ (* thetadot (cos psi)) 
(* phidot (sin theta) (sin psi)))) 
(omega-b (+ (* -1 thetadot (sin psi)) 
(* phidot (sin theta) (cos psi)))) 
(omega-c (+ (* phidot (cos theta)) psidot))) 
(column-matrix omega-a omega-b omega-c))))) 


132 Chapter 2 Rigid Bodies 


The kinetic energy can be written: 


(define ((T-rigid-body A B C) local) 
(let ((omega-body (Euler-state->omega-body local))) 
(* 1/2 
(+ (* A (square (ref omega-body 0))) 
(* B (square (ref omega-body 1))) 
(* C (square (ref omega-body 2))))))) 


2.8 Vector Angular Momentum 


The vector angular momentum of a particle is the cross product of 
the position and the linear momentum. For a rigid body the vector 
angular momentum is the sum of the vector angular momentum 
of each of the constituents. Here we find an expression for the 
vector angular momentum of a rigid body in terms of the inertia 
tensor and the angular velocity vector. 

The vector angular momentum of a rigid body is: 


S Ba (hata) (2.44) 


where Zo, Za, and Ma are the positions, velocities, and masses 
of the constituent particles. It turns out that the vector angular 
momentum decomposes into the sum of the angular momentum 
of the center of mass and the rotational angular momentum about 
the center of mass, just as the kinetic energy separates into the 
kinetic energy of the center of mass and the kinetic energy of 
rotation. As in the kinetic energy demonstration, decompose the 
position into the vector to the center of mass X and the vectors 
from the center of mass to the constituent mass elements £a: 


B= X+&, (2.45) 
with velocities 

ae ee (2.46) 
Substituting, the angular momentum is 


So ma(X + &) x (¥ +E). (2.47) 
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Multiplying out the product, and using the fact that X is the 
center of mass, and M = J}, mq is the total mass of the body, 
the angular momentum is 


X x (MX) + ĎE, x (mafa). (2.48) 


The angular momentum of the center of mass is 
X x (MX), (2.49) 


and the rotational angular momentum is 
SS ei (tad): (2.50) 
Q 


We can also reexpress the rotational angular momentum in 
terms of the angular velocity vector and the inertia tensor, as 
we did for the kinetic energy. Using E y = WX ae The rotational 
angular momentum is 


L= Mati Xe Xn). (2.51) 
Q 
In terms of components with respect to the basis ê;, this is 


hE y Te, (2.52) 
k 


where I; are the components of the inertia tensor (2.14). The 
angular momentum and the kinetic energy are expressed in terms 
of the same inertia tensor. 

With respect to the principal axis basis, the angular momentum 
components have a particularly simple form: 


La = Aw? (2.53) 
Ly = Bu? (2.54) 
Le = Cwt. (2.55) 


Exercise 2.9: 


Verify that the expression (2.52) for the components of the rotational 
angular momentum (2.51) in terms of the inertia tensor is correct. 
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We can define procedures to calculate the components of the 
angular momentum on the principal axes: 


(define ((Euler-state->L-body A B C) local) 
(let ((omega-body (Euler-state->omega-body local))) 
(column-matrix (* A (ref omega-body 0)) 
(* B (ref omega-body 1)) 
(* C (ref omega-body 2))))) 


We then transform the components of the angular momentum on 
the principal axes to the components on the fixed basis é;: 


(define ((Euler-state->L-space A B C) local) 
(let ((angles (coordinate local))) 
(* (Euler->M angles) 
((Euler-state->L-body A B C) local)))) 


These procedures are local state functions, like Lagrangians. 


2.9 Motion of a Free Rigid Body 


The kinetic energy, expressed in terms of a suitable set of gen- 
eralized coordinates, is a Lagrangian for a free rigid body. In 
section 2.1 we found that the kinetic energy of a rigid body can 
be written as the sum of the rotational kinetic energy and the 
translational kinetic energy. By choosing one set of coordinates 
to specify the position and another set to specify the orientation 
the Lagrangian becomes a sum of a translational Lagrangian and a 
rotational Lagrangian. The Lagrange equations for translational 
motion are not coupled to the Lagrange equations for the rota- 
tional motion. For a free rigid body the translational motion is 
just that of a free particle: uniform motion. Here we concentrate 
on the rotational motion of the free rigid body. We can adopt the 
Euler angles as the coordinates that specify the orientation; the 
rotational kinetic energy was expressed in terms of Euler angles 
in the previous section. 


Conserved quantities 
The Lagrangian for a free rigid body has no explicit time depen- 
dence, so we can deduce that the energy, which is just the kinetic 
energy, is conserved by the motion. 

The Lagrangian does not depend on the Euler angle y, so we 
can deduce that the momentum conjugate to this coordinate is 
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conserved. An explicit expression for the momentum conjugate to 
y is: 


(define Euler-state 
(up ?t 
(up ’theta ’phi ’psi) 
(up ’thetadot ’phidot ’psidot))) 


(show-expression 
(ref (((partial 2) (T-rigid-body ’A ’B ’C)) Euler-state) 
1)) 


Ay (sin (0))? (sin ())” + AO cos (Y) sin (0) sin (Y) 
+ Bọ (cos (Y))’ (sin (6))? — B69 cos (y) sin (8) sin (Y) 
+ C (cos (0))? + Cih cos (0) 


We know that this complicated quantity is conserved by the mo- 
tion of the rigid body because of the symmetries of the Lagrangian. 

If there are no external torques, then we expect that the vector 
angular momentum will be conserved. We can verify this using 
the Lagrangian formulation of the problem. First, we note that 
L, is the same as py. We can check this by direct calculation: 


(print-expression 
(- (ref ((Euler-state->L-space ’A ’B ’C) Euler-state) 


2) 
(ref (((partial 2) (T-rigid-body ’A ’B ’C)) Euler-state) 
1))) 
; Value: 0 


We know that pọ is conserved because the Lagrangian for the free 
rigid body did not mention y, so now we know that L, is con- 
served. Since the orientation of the coordinate axes is arbitrary, 
we know that if any rectangular component is conserved then all 
of them are. So the vector angular momentum is conserved for 
the free rigid body. 

Of course, we could have seen this with the help of Noether’s 
theorem (see section 1.8.4). There are a continuous family of ro- 
tations that can transform any orientation into any other orienta- 
tion. The orientation of the coordinate axes we used to define the 
Euler angles is arbitrary, and the kinetic energy (the Lagrangian) 
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is the same for any choice of coordinate system. Thus the situa- 
tion meets the requirements of Noether’s theorem, which tells us 
that there is a conserved quantity. In particular, the family of 
rotations around each coordinate axis gives us conservation of the 
angular momentum component on that axis. We construct the 
vector angular momentum by combining these contributions. 


Exercise 2.10: Vector angular momentum 

Fill in the details of the argument that Noether’s theorem implies that 
vector angular momentum is conserved by the motion of the free rigid 
body. 


2.9.1 Computing the Motion of Free Rigid Bodies 


Lagrange’s equations for the motion of a free rigid body in terms 
of Euler angles are quite disgusting, so we will not show them 
here. However, we will use the Lagrange equations to explore the 
motion of the free rigid body. 

Before doing this it is worth noting that the equations of motion 
in Euler angles are singular for some configurations, because for 
these configurations the Euler angles are not uniquely defined. If 
we set 0 = 0 then an orientation does not correspond to a unique 
value of y and 4%; only their sum determines the orientation. 

The singularity arises in the explicit Lagrange equations when 
we attempt to solve for the second derivative of the generalized 
coordinates in terms of the generalized coordinates and the gen- 
eralized velocities (see section 1.7). The isolation of the second 
derivative requires multiplying by the inverse of 02:02L. The de- 
terminant of this quantity becomes zero when the Euler angle 6 
is zero. 


(show-expression 

(determinant 
(((square (partial 2)) (T-rigid-body ’A ’B ’C)) 
Euler-state) )) 


ABC (sin (6))? 


So when @ is zero, we cannot solve for the second derivatives. 
When @ is small, the Euler angles can move very rapidly, and thus 
may be difficult to compute reliably. Of course, the motion of the 
rigid body is perfectly well behaved for any orientation. This is a 
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problem of the representation of that motion in Euler angles; it is 
a “coordinate singularity.” 

One solution to this problem is to use another set of Euler-like 
coordinates for which Lagrange’s equations have singularities for 
different orientations, such as those defined in equation (2.40). So 
as the calculation proceeds, if we come close to a singularity in one 
set of coordinates we can switch and use the other set for a while 
until they encounter a singularity. This solves the problem, but it 
is cumbersome. For the moment we will ignore this problem and 
compute some trajectories, being careful to limit our attention to 
trajectories that avoid the singularities. 

We will compute some trajectories by numerical integration and 
check our integration process by seeing how well energy and an- 
gular momentum are conserved. Then, we will investigate the 
evolution of the components of angular momentum on the prin- 
cipal axis basis. We will discover that we can learn quite a bit 
about the qualitative behavior of rigid bodies by combining the 
information we get from the energy and angular momentum. 

To develop a trajectory from initial conditions we integrate the 
Lagrange equations, as we did in chapter 1. The system derivative 
is obtained from the Lagrangian: 


(define (rigid-sysder A B C) 
(Lagrangian->state-derivative (T-rigid-body A B C))) 


The following program monitors the errors in the energy and the 
components of the angular momentum: 


(define ((monitor-errors win A B C LO EO) state) 
(let ((t (time state)) 
(L ((Euler-state->L-space A B C) state)) 
(E ((T-rigid-body A B C) state))) 
(plot-point win t (relative-error (ref L 0) (ref LO 0))) 
(plot-point win t (relative-error (ref L 1) (ref LO 1))) 
(plot-point win t (relative-error (ref L 2) (ref LO 2))) 
(plot-point win t (relative-error E E0)))) 


(define (relative-error value reference-value) 
(if (zero? reference-value) 
(error "Zero reference value -- RELATIVE-ERROR") 
(/ (- value reference-value) reference-value) )) 


We make a plot window to display the errors: 
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(define win (frame 0. 100. -1.e-12 1.e-12)) 


The default integration method is Bulirsch-Stoer (bulirsch-stoer); 
the integration method used here is quality-controlled Runge- 
Kutta (qcrk4): 


(set! *ode-integration-method* ’qcrk4) 


We use evolve to investigate the evolution: 


(let ((A 1.) (B (sqrt 2.)) (C 2.) ; moments of inertia 
(stateO (up 0.0 ; initial state 
(up 1. 0. 0.) 
(up 0.1 0.1 0.1)))) 
(let ((LO ((Euler-state->L-space A B C) state0)) 
(EO ((T-rigid-body A B C) state0))) 
((evolve rigid-sysder A B C) 


stated 

(monitor-errors win A B C LO EO) 

0.1 ; step between plotted points 
100.0 ; final time 

1.0e-12))) ; max local truncation error 


The plot that is developed of the relative errors in the components 
of the angular momenta and the energy (see figure 2.2) shows that 
we have been successful in controlling the error in the conserved 
quantities. This should give us some confidence in the trajectory 
that is evolved. 


2.9.2 Qualitative Features of Free Rigid Body Motion 


The evolution of the components of the angular momentum on 
the principal axes has a remarkable property. For almost every 
initial condition the body components of the angular momentum 
periodically trace a simple closed curve. 

We can see this by investigating a number of trajectories, and 
plotting the components of angular momentum of the body on 
the principal axes (see figure 2.3). For most initial conditions 
we find a a one-dimensional simple-closed curve. The trajectories 
appear to cross because they are projected. There are special 
initial conditions that produce trajectories, called the separatriz, 
that appear to intersect in two points. 

To make this figure a number of trajectories of equal energy 
were computed. The three dimensional space of body components 
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Figure 2.2 The relative error in energy and in the three spatial com- 
ponents of the angular momentum versus time. It is interesting to note 
that the energy error is one of the three falling curves. 


is projected onto a two-dimensional plane for display. Points on 
the back of this projection of the ellipsoid of constant energy are 
plotted with lower density than points on the front of the ellipsoid. 

What is going on? The state space for a free rigid body is six 
dimensional: the three Euler angles and their time derivatives. 
We know four constants of the motion—the three spatial compo- 
nents of the angular momentum, L,, Ly, and Lz, and the energy, 
E. Thus, the motion is restricted to a two-dimensional region of 
the state space.? Our experiment shows that the components of 
the angular momentum trace one-dimensional closed curves in the 
angular-momentum subspace, so there is something more going on 
here. 

The total angular momentum is conserved if all of the compo- 
nents are, so we also have the constant 


L = L + L + Li. (2.56) 


We expect that for each constant of the motion we reduce by one the di- 
mension of the region of the state space explored by a trajectory. This is 
because a constant of the motion can be used to locally solve for one of the 
state variables in terms of the others. 
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Figure 2.3 Trajectories of the components of the angular momentum 
vector on the principal axes, projected onto a plane. Each closed curve, 
except for the separatrix, is a different trajectory. All the trajectories 
shown here have the same energy. 


The spatial components of the angular momentum do not change, 
but of course the projections of the angular momentum onto the 
principal axes do change because the axes move as the body moves. 
However, the magnitude of the angular momentum vector is the 
same whether it is computed from components on the fixed basis 
or components on the principal axis basis. So, the combination 


I? = L + R +r, (2.57) 


is conserved. 
Using the expressions (2.53 - 2.55) for the angular momentum 
in terms of the components of the angular velocity vector on the 
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principal axes, the kinetic energy (2.30) can be rewritten in terms 
of the angular momentum components on the principal axes 


flees. x he 
e= (4 = =). (2.58) 


The two integrals (2.57 and 2.58) provide constraints on how 
the components of the angular momentum vector on the princi- 
pal axes can change. We recognize the angular momentum in- 
tegral (2.57) as the equation of a sphere, and the kinetic energy 
integral (2.58) as the equation for a triaxial ellipsoid. Both inte- 
grals are conserved so the components of the angular momentum 
are constrained to move on the intersection of these two surfaces, 
the energy ellipsoid and the angular momentum sphere. The in- 
tersection of an ellipsoid and a sphere with the same center is 
generically two closed curves, so an orbit is confined to one of 
these curves. This sheds light on the puzzle at the beginning of 
this section. 

Because of our ordering A < B < C, the longest axis of this 
triaxial ellipsoid coincides with the ¢ direction, when all the an- 
gular momentum is along the axis of largest principal moment of 
inertia, and the shortest axis of the energy ellipsoid coincides with 
the @ axis, when all the angular momentum is along the smallest 
moment of inertia. Without actually solving the Lagrange equa- 
tions, we have found strong constraints on the evolution of the 
components of the angular momentum on the principal axes. 

To determine how the system evolves along these intersection 
curves we have to use the equations of motion. We observe that 
the evolution of the components of the angular momentum on 
the principal axes depends only on the components of the angular 
momentum on the principal axes, even though the values of these 
components are not enough to completely specify the dynamical 
state. Apparently the dynamics of these components is self con- 
tained, and we will see that it can be described in terms of a set 
of differential equations whose only dynamical variables are the 
components of the angular momentum on the principal axes (see 
section 2.12). 

We note that there are two axes for which the intersection 
curves shrink to a point if we hold the energy constant and vary the 
magnitude of the angular momentum. If the angular momentum 
starts at these points, the integrals constrain the angular momen- 
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tum to stay there. These points are equilibrium points for the body 
components of the angular momentum. However, these points are 
not equilibrium points for the system as a whole. At these points 
the body is still rotating even though the body components of the 
angular momentum are not changing. This kind of equilibrium is 
called a relative equilibrium. We can also see that if the angular 
momentum is initially slightly displaced from one of these relative 
equilibria then the angular momentum is constrained to stay near 
it on one of the intersection curves. The angular momentum vec- 
tor is fixed in space, so the principal axis of the equilibrium point 
of the body rotates stably about the angular momentum vector. 

At the principal axis with intermediate moment of inertia, the b 
axis, the intersection curves cross. As we observed, the dynamics 
of the components of the angular momentum on the principal axes 
form a self-contained dynamical system. Trajectories of a dynam- 
ical system cannot cross,!9 so the most that can happen is that 
if the equations of motion carry the system along the intersec- 
tion curve then the system can only asymptotically approach the 
crossing point. So without solving any equations we can deduce 
that the point of crossing is another relative equilibrium. If the 
angular momentum is initially aligned with the intermediate axis, 
then it stays aligned. If the system is slightly displaced from the 
intermediate axis, then the evolution along the intersection curve 
will take the system far from the relative equilibrium. So rotation 
about the axis of intermediate moment of inertia is unstable— 
initial displacements of the angular momentum, however small 
initially, become large. Again, the angular momentum vector is 
fixed in space, but now the principal axis with the intermediate 
principal moment does not stay close to the angular momentum, 
so the body executes a complicated tumbling motion. 

This gives some insight into the mystery of the thrown book 
mentioned at the beginning of the chapter. If one throws a book 
so that it is initially rotating about either the axis with the largest 
or the smallest moment of inertia (the smallest and largest physi- 
cal axes, respectively), the book rotates regularly about that axis. 
However, if the book is thrown so that it is initially rotating about 
the axis of intermediate moment of inertia (the intermediate phys- 
ical axis), then the book tumbles, however carefully the book is 


Systems of ODEs that satisfy a Lipschitz condition have unique solutions. 
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thrown. You can try it with this book (but put a rubber band 
around it first). 

Before moving on, we can make some further physical deduc- 
tions. Suppose a freely rotating body is subject to some sort of 
internal friction that dissipates energy, but conserves the angular 
momentum. For example, real bodies flex as they spin. If the 
spin axis moves with respect to the body then the flexing changes 
with time, and this changing distortion converts kinetic energy 
of rotation into heat. Internal processes do not change the total 
angular momentum of the system. If we hold the magnitude of 
the angular momentum fixed but gradually decrease the energy 
then the curve of intersection on which the system moves gradu- 
ally deforms. For a given angular momentum there is a lower limit 
on the energy; the energy cannot be so low that there are no in- 
tersections. For this lowest energy the intersection of the angular 
momentum sphere and the energy ellipsoid is a pair of points on 
the axis of maximum moment of inertia. With energy dissipation, 
a freely rotating physical body eventually ends up with the lowest 
energy consistent with the given angular momentum, which is ro- 
tation about the principal axis with the largest moment of inertia 
(typically the shortest physical axis). 

Thus, we expect that given enough time all freely rotating phys- 
ical bodies will end up rotating about the axis of largest moment of 
inertia. You can demonstrate this to your satisfaction by twirling 
a small bottle containing some viscous fluid, such as correction 
fluid. What you will find is that, whatever spin you try to put 
on the bottle, it will reorient itself so that the axis of the largest 
moment of inertia is aligned with the spin axis. Remarkably, this 
is very nearly true of almost every body in the solar system for 
which there is enough information to decide. The deviations from 
principal axis rotation for the Earth are tiny, the angle between 
the angular momentum vector and the ¢ axis for the Earth is less 
than one arc-second.!! In fact, the evidence is that all of the plan- 
ets, the Moon and all of the other natural satellites, and almost 
all of the asteroids rotate very nearly about the largest moment 
of inertia. We have deduced that this is to be expected using 
an elementary argument. There are exceptions. Comets typically 
do not rotate about the largest moment. As they are heated by 


"The deviation of the angular momentum from the principal axis may be due 
to a number of effects: earthquakes, atmospheric tides, ... . 
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the sun, material spews out from localized jets, and the back reac- 
tion from these jets changes the rotation state. Among the natural 
satellites, the only known exception is Saturn’s satellite Hyperion, 
which is tumbling chaotically. Hyperion is especially out-of-round 
and subject to strong gravitational torques from Saturn. 


2.10 Axisymmetric Tops 


We have all played with a top at one time or another. For the 
purposes of analysis we will consider an idealized top that does 
not wander around. Thus, an ideal top is a rotating rigid body, 
one point of which is fixed in space. Furthermore, the center of 
mass of the top is not at the fixed point, which is the center of 
rotation, and there is a uniform gravitational acceleration. 

For our top we can take the Lagrangian to be the difference 
of the kinetic energy and the potential energy. We already know 
how to write the kinetic energy—what is new here is that we must 
express the potential energy in terms of the configuration. In the 
case of a body in a uniform gravitational field this is easy. The 
potential energy is sum of “mgh” for all the constituent particles: 


`> Magha, (2.59) 


where g is the gravitational acceleration, ha = Ta : Z, and where 
the unit vector Z indicates which way is up. Rewriting the vector 
to the constituents in terms of the vector X to the center of mass, 
the potential energy is: 


So mag (X + £4) -Ê 


=gMX-i+ 9 (Sg) 2 
Q 
=gMX - 2, (2.60) 


where the last sum is zero because the center of mass is the origin 
of a. So the potential energy of a body in a gravitational field 
with uniform acceleration is very simple: it is just Mgh, where M 
is the total mass, and h = X -2 is the height of the center of mass. 
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Figure 2.4 An axisymmetric top is a symmetrical rigid body in a 
uniform gravitational field with one point of the body fixed in space. 
The Euler angles used to specify the configuration are indicated. 


Here we consider an axisymmetric top (see figure 2.4). Such 
a top has an axis of symmetry of the mass distribution, so the 
center of mass is on the symmetry axis, and the fixed point is also 
on the axis of symmetry. 

In order to write the Lagrangian we need to choose a set of 
generalized coordinates. If we choose them well we can take ad- 
vantage of the symmetries of the problem. If the Lagrangian does 
not depend on a particular coordinate the conjugate momentum 
is conserved, and the complexity of the system is reduced. 

The axisymmetric top has two apparent symmetries. The fact 
that the mass distribution is axisymmetric implies that neither 
the kinetic nor potential energy is sensitive to the orientation of 
the top about that symmetry axis. Additionally, the kinetic and 
potential energy are insensitive to a rotation of the physical system 
about the vertical axis, because the gravitational field is uniform. 

We can take advantage of these symmetries by choosing ap- 
propriate coordinates, and we already have a coordinate system 
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that does the job—the Euler angles.!? We choose the reference 
orientation so that the symmetry axis is vertical. The first Euler 
angle w expresses a rotation about the symmetry axis. The next 
Euler angle 0 is the tilt of the symmetry axis of the top from the 
vertical. The third Euler angle y expresses a rotation of the top 
about the z axis. The symmetries of the problem imply that the 
first and third Euler angles do not appear in the Lagrangian. Asa 
consequence the momenta conjugate to these angles are conserved 
quantities. Let’s work out the details. 

First, we work out the Lagrangian explicitly. The general form 
of the kinetic energy has been worked out, but here there is one 
twist. The top is constrained so that it pivots about a fixed point 
that is not at the center of mass. So the moments of inertia that 
enter the kinetic energy are the moments of inertia of the top 
with respect to the pivot point, not about the center of mass. If 
we know the moments of inertia about the center of mass we can 
write the moments of inertia about the pivot in terms of them (see 
exercise 2.2). So let’s assume the principal moments of inertia of 
the top about the pivot are A, B, and C, and A = B because of 
the symmetry.!? We can use the computer to help us figure out 
the Lagrangian for this special case: 


(show-expression 
((T-rigid-body ’A ’A ’C) 
(up ?t 
(up ’theta ’phi ’psi) 
(up ’thetadot ’phidot ’psidot)))) 


1 1 . 1. 1. 
z (sin (0))? Ag? + cos (0) (5 cos (0) Cg? + Cy) + 5740? F z0% 


We can rearrange this a bit to get 


T(t; 0,9, 38, p, 0) 
= 5A (6? + p? sin? 6) + 10 (bh + pcos6)”. (2.61) 


That the axisymmetric top can be solved in Euler angles is, no doubt, the 
reason for the traditional choice of the definition of the Euler angles. For other 
problems, the Euler angles may offer no particular advantage. 


13Here, we do not require that C be larger than A = B, because they are not 
measured with respect to the center of mass. 
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In terms of Euler angles, the potential energy is 
V(t; 0, p, 0; 9, p, h) = MgRcos8, (2.62) 


where R is the distance of the center of mass from the pivot. The 
Lagrangian is L = T — V. We see that the Lagrangian is indeed 
independent of y and y, as expected. 

There is no particular reason to look at the Lagrange equations. 
We can assign that job to the computer when needed. However, we 
have already seen that it may be useful to examine the conserved 
quantities associated with the symmetries. 

The energy is conserved, because the Lagrangian has no ex- 
plicit time dependence. Also, the energy is the sum of the kinetic 
and potential energy E = T + V, because the kinetic energy is 
a homogeneous quadratic form in the generalized velocities. The 
energy is 


E= 5A (6? +o? sin? 0) + $C (ù + ~cos 0)” + MgRcos@. (2.63) 


Two of the generalized coordinates do not appear in the La- 
grangian, so there are two conserved momenta. The momentum 
conjugate to ọ is 


Py = (A(sin 0)? + C(cos 6)”) p + Cr) cos 0. (2.64) 
The momentum conjugate to w is 
Py = Chi) + Gcos8). (2.65) 


The state of the system at a moment is specified by the tuple 
(t; 0, 9, W; 8, ~, Y). The two coordinates y and w that do not ap- 
pear in the Lagrangian do not appear in the Lagrange equations 
or the conserved momenta. So the evolution of the remaining 
four state variables, 0, 0, p, and w, depends only on those re- 
maining state variables. This subsystem for the top has a four 
dimensional state space. The variables that did not appear in the 
Lagrangian can be determined by integrating the derivatives of 
these variables, which are determined separately by solving the 
independent subsystem. 
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The evolution of the top is described by a four-dimensional 
subsystem and two auxiliary quadratures.!4 This subdivision is a 
consequence of choosing generalized coordinates that incorporate 
the symmetries. However, the choice of generalized coordinates 
that incorporate the symmetries also gives conserved momenta. 
We can make use of these momenta to further simplify the for- 
mulation of the problem. Each integral can be used to locally 
eliminate one dimension of the subsystem. In this case the sub- 
system has four dimensions and there are three integrals, so the 
system can be completely reduced to quadratures. For the top, 
this can be done analytically, but we think it is a waste of time to 
do it. Rather, we are interested in extracting interesting features 
of the motion. We concentrate on the energy integral and use 
the two conserved momenta to eliminate ¢ and w. After a bit of 
algebra we find: 


1 jo, Pe pycos0)? Po 
E = 56 + ZA(sin 0)? T aC + MgRcos 6. (2.66) 


Along a path 0, where D6(t) is substituted for 0, this is an ordi- 
nary differential equation for 6. This differential equation involves 
various constants, some of which are set by the initial conditions 
of the other state variables. The solution of the differential equa- 
tion for 0 involves no more than ordinary integrals. So the top is 
essentially solved. We could continue this argument to obtain the 
qualitative behavior of 0: Using the energy (2.66), we can plot the 
trajectories in the plane of 0 versus 0, and see that the motion 
of 0 is simply periodic. However we will defer continuing along 
this path until chapter 3, when we have developed more tools for 
analysis. 

Let’s get real. Let’s make a top out of a disk of aluminum with a 
steel rod through the center to make the pivot. Measuring the top 
very carefully we find that the moment of inertia of the top about 
the symmetry axis is about 6.60 x 1075 kg m?, and the moment 
of inertia about the pivot point is about 3.28 x 1074 kg m?. The 
combination gM R is about 0.0456 kg m?sec~?. We spin the top 
up with an initial angular velocity of a = 140 radians/second 
(about 1337 rpm). The top initially has 6 = y = w = 0 and 


Traditionally, evaluating a definite integral is known as performing a quadra- 
ture. 
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Figure 2.5 The tilt angle 7 — 0 of the top versus time. The tilt of the 
top varies periodically. 
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Figure 2.6 The precession angle y of the top versus time. The top 
precesses nonuniformly—the rate of precession varies as the tilt varies. 
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Figure 2.7 The rate of rotation w of the top versus time. The rate of 
rotation of the top changes periodically, as the tilt of the top varies. 
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Figure 2.8 An idea of the actual motion of the top is obtained by 
plotting the tilt angle m — 0 versus the precession angle y. This is a 
“latitude-longitude” map showing the path of the center of mass of the 
top. We see that though the top has a net precession it executes a 
looping motion as it precesses. 
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is initially tilted with 0 = 0.1 radians. We then kick it so that 
ġ = —15 radians/second. Figures 2.5 - 2.8 display aspects of the 
evolution of the top for 2 seconds. The tilt of the top (measured by 
0) varies in a periodic manner. The orientation about the vertical 
is measured by y: we see that the top also precesses, and the 
rate of precession varies with 0. We also see that as the top bobs 
up and down the rate of rotation of the top oscillates—the top 
spins faster when it is more vertical. The plot of tilt versus the 
precession angle shows that in this case the top executes a looping 
motion. If we do not kick it, but just let it drop then the loop 
disappears leaving just a cusp. If we kick it in the other direction, 
then there is no cusp or any looping motion. 


Exercise 2.11: Kinetic energy of the top 

We have asserted, without proof, that the kinetic energy of the top is 
the kinetic energy of rotation about the pivot point. Show that this is 
the same as the sum of the rotational kinetic energy about its center of 
mass and the kinetic energy of the motion of the center of mass. 


Exercise 2.12: Nutation of the top 
a. Carry out the algebra to obtain the energy (2.66) in terms of @ and Å. 


b. Numerically integrate the Lagrange equations for the top to obtain 
figure 2.5, 0 versus time. 


c. Note that the energy is a differential equation for Ê in terms of 0, 
with conserved quantities pg, py and E determined by initial conditions. 
Can we use this differential equation to obtain 0 as a function of time? 
Explain. 


Exercise 2.13: Precession of the top 

Consider a top that is rotating so that 0 is constant. 

a. Using the angular momentum integrals, compute the rate of preces- 
sion yp. 


b. Assume that q) is very large. Develop an approximate formula for the 
precession rate by equating the rate of change of the angular momentum 
to the gravitational torque on the center of mass. 


c. Numerically integrate the top and check your estimate. Investigate 
how the rate of precession varies with 0 keeping other inputs fixed. 
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2.11 Spin-Orbit Coupling 


The rotation of planets and natural satellites is affected by the 
gravitational forces from other celestial bodies. As an extended 
application of our development of the equations governing the mo- 
tion of forced rigid bodies we consider the rotation of celestial 
objects subject to gravitational forces. 

We first develop the form of the potential energy for the grav- 
itational interaction of an extended body with an external point 
mass. With this potential energy and the usual rigid body kinetic 
energy we can form Lagrangians that model a number of systems. 
We will take an initial look at the rotation of the Moon and Mer- 
cury; later, we will return to study these systems after we have 
developed more tools. 


2.11.1 Development of the Potential Energy 


The first task is to develop convenient expressions for the gravi- 
tational potential energy of the interaction of a rigid body with 
a distant mass point. A rigid body can be thought of as made 
of a large number of mass elements, subject to rigid coordinate 
constraints. We have seen that the kinetic energy of a rigid body 
is conveniently expressed in terms of the moments of inertia of 
the body and the angular velocity vector, which in turn can be 
represented in terms of a suitable set of generalized coordinates. 
The potential energy can be developed in a similar manner. We 
first represent the potential energy in terms of moments of the 
mass distribution and later introduce generalized coordinates as 
particular parameters of the potential energy. 

The gravitational potential energy of a mass point and a rigid 
body (see figure 2.9) is the sum of the potential energy of the mass 
point with each mass element of the body: 


1 
S5 GM ma (2.67) 
Ta 


where M’ is the mass of the external point mass, ra is the distance 
between the point mass and the constituent mass element with 
index @, Ma is the mass of this constituent element, and G is 
the gravitational constant. Let R be the distance of the center 
of mass of the rigid body to the point mass; R is the magnitude 
of the vector 7 — X , where the external mass point has position 
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origin 


Figure 2.9 The gravitational potential energy of a mass point and a 
rigid body is the sum of the gravitational potential energy of the mass 
point with each constituent mass element of the rigid body. 


z, and the center of mass has position X. The vector from the 
center of mass to the constituent with index a is éa, and has 
epi Sa- The distance ra is then given by the law of cosines 

= R? + £2 — 2a R cos Oa where ĝa is the angle between # — X 
Be. The potential energy is then 


-GM' ne (2.68) 
2, (R2 + €2 — 2a R cos 0a)? 


This is complete, but we need to find a representation that does 
not mention each constituent. 

Typically, the size of celestial bodies is small compared to the 
separation between them. We can make use of this to find a more 
compact representation of the potential energy. If we expand the 
potential energy in the small ratio €,/R we find!® 


nate Daves 2 Sa py cos ĝa) (2.69) 


The Legendre polynomials P, may be obtained by expanding (1 + y? — 
2yx)~'/? as a power series in y. The coefficient of y! is P;(x). The first few 
Legendre polynomials are:Po(xz) = 1, Pi(z) = x, P2(z) = 3x7 — }, and so 
on. The rest satisfy the recurrence relation: |P;(x) = (2l — 1)xP;—ı (x) — (l — 
1) Pi_2(2). 
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where P is the [th Legendre polynomial. Interchanging the order 
of the summations: 


se Loe P;(cos ĝa). (2.70) 


Successive terms in this expansion of the potential energy typically 
decrease very rapidly because celestial bodies are small compared 
to the separation between them. We can compute an upper bound 
to the size of these terms by replacing each factor in the sum 
over a by an upper bound. The Legendre polynomials all have 
magnitudes less than one for arguments in the range —1 to 1. The 
distances éa are all less than some maximum extent of the body 
Emax. The sum over Ma times these upper bounds is just the total 
mass M times the upper bounds. Thus 


| Emafi P,(cos 8a) || < mé, (2.71) 


We see that the upper bound on successive terms decreases by a 
factor Emax/R. Successive terms may be smaller still. For large 
bodies the gravitational force is strong enough to overcome the 
internal material strength of the body, so the body, over time, 
becomes nearly spherical. Successive terms in the expansion of 
the potential are measures of the deviation of the mass distribu- 
tion from a spherical mass distribution. Thus for large bodies 
the higher order terms are small because the bodies are nearly 
spherical. 

Consider the first few terms in l. For l = 0 the sum over a just 
gives the total mass M of the rigid body. For l = 1 the sum over 
œ is zero, as a consequence of choosing the origin of the Ey to be 
the center of mass. For | = 2 we have to do a little more work. 
The sum involves second moments of the mass distribution, and 
can be written in terms of moments of inertia of the rigid body: 


2 Maga Palcos 0a) = 2 maa (5 (cos 0a)? — 5 
= S Make (1 — = (sin 04)*) 


=5A +B+C-3)), (2.72) 
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where A, B, and C are the principal moments of inertia, and I 
is the moment of inertia of the rigid body about the line between 
the center of mass of the body to the external point mass. The 
moment J depends on the orientation of the rigid body relative to 
the line between the bodies. The contributions to the potential 
energy up to | = 2 are then!® 


GMM' GM' 
R 2R? 


Let a = cos ĝa, 3 = cos , and y = cos ĝe be the direction cosines 
of the angles 0a, 6, and 0e between the principal axes â, b, and ¢ 
and the line between the center of mass and the point mass.!? A 
little algebra shows I = a? A + 8?B + 7?C. The potential energy 
is then 


GMM’ GM' 
R 2R? 


This is a good first approximation to the potential energy of in- 
teraction for most situations in the solar system; if we intended to 
land on the moon we probably would want to take into account 
higher order terms in the expansion. 


AFRFO=8): (2.73) 


[(1 — 3a?) A + (1 — 38°) B + (1 —39°)C). (2.74) 


Exercise 2.14: 


a. Fill in the details that show that the sum over consitutents in equa- 
tion (2.72) can be expressed as written in terms of moments of inertia. 
In particular, show that 


No mata cos ĝa = 0, 
X ma? = 2(A+B+C), 


and that 
5 Maé (sin 0a)? = I. 


16 This approximate representation of the potential energy is sometimes called 
MacCullagh’s formula. 


'"Watch out, we just reused a. It was also used as the constituent index. 
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b. Show that if the principal moments of inertia of a rigid body are A, 
B, and C, then the moment of inertia about an axis that goes through 
the center of mass of the body with direction cosines a, 3, and y relative 
to the principal axes is 


I =° A+B B+C. 


2.11.2 Rotation of the Moon and Hyperion 


The approximation to the potential energy that we have derived 
can be used for a number of different problems. For instance, it 
can be used to investigate the effect of oblateness on the evolution 
of an artificial satellite about the Earth, or to incorporate the 
effect of planetary oblateness on the evolution of the orbits of 
natural satellites, such as the Moon, or the Galilean satellites of 
Jupiter. However, as the principal application here, we will use 
it to investigate the rotational dynamics of natural satellites and 
planets. 

The potential energy depends on the position of the point mass 
relative to the rigid body and on the orientation of the rigid body. 
Thus the changing orientation is coupled to the orbital evolution; 
each affects the other. However, in many situations the effect of 
the orientation of the body on the evolution of the orbit may be 
ignored. One way to see this is to look at the relative magnitudes 
of the two terms in the potential energy (2.74). We already know 
that the second term is guaranteed to be smaller than the first by 
a factor of (€max/R)?, but often it is much smaller still because the 
body involved is nearly spherical. For example, the radius of the 
Moon is about a third the radius of the Earth and the distance to 
the Moon is about 60 Earth-radii. So the second term is smaller 
than the first by a factor of order 1074 due to the size factors. In 
addition the Moon is roughly spherical and for any orientation the 
combination A+ B + C — 3I is of order 1074C. Now C is itself 
of order 2M R?, because the density of the Moon does not vary 
strongly with radius. So for the Moon the second term is of order 
1078 relative to the first. Even radical changes in the orientation 
of the Moon would have little dynamical effect on the orbit of the 
Moon. 

We can learn some important qualitative aspects of the orien- 
tation dynamics by studying a simplified model problem. First, 
we assume that the body is rotating about its largest moment of 
inertia. This is a natural assumption. Remember that for a free 
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rigid body the loss of energy while conserving angular momentum 
leads to rotation about the largest moment of inertia. This is 
observed for most bodies in the solar system. Next, we assume 
that the spin axis is perpendicular to the orbital motion. This is a 
good approximation for the rotation of natural satellites, and is a 
natural consequence of tidal friction—dissipative solid body tides 
raised on the satellite by the gravitational interaction with the 
planet. Finally, for simplicity we take the rigid body to be mov- 
ing on a fixed elliptic orbit. This may approximate the motion 
of some physical systems, provided the timescale of the evolution 
of the orbit is large compared to any timescale associated with 
the rotational dynamics that we are investigating. So we have 
a nice toy problem. This problem has been used to investigate 
the rotational dynamics of Mercury, the Moon, and other natural 
satellites. It makes specific predictions concerning the rotation of 
Phobos, a satellite of Mars, which can be compared with observa- 
tions. It provides a basic understanding of the fact that Mercury 
rotates precisely 3 times for every 2 orbits it completes, and is the 
starting point for understanding the chaotic tumbling of Saturn’s 
satellite Hyperion. 


ay 


Figure 2.10 The spin-orbit model problem in which the spin axis is 
constrained to be perpendicular to the orbit plane has a single degree 
of freedom, the orientation of the body in the orbit plane. Here the 
orientation is specified by the generalized coordinate @. 
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We are assuming that the orbit does not change or precess. The 
orbit is an ellipse with the point mass at a focus of the ellipse. The 
angle f (see figure 2.10) measures the position of the rigid body 
in its orbit relative to the point in the orbit at which the two 
bodies are closest.'® We assume the orbit is a fixed ellipse, so the 
angle f and the distance R are periodic functions of time, with 
period equal to the orbit period. With the spin axis constrained 
to be perpendicular to the orbit plane, the orientation of the rigid 
body is specified by a single degree of freedom: the orientation of 
the body about the spin axis. We specify this orientation by the 
generalized coordinate 0 that measures the angle to the â principal 
axis from the same line as we measure f, the line through the point 
of closest approach. 

Having specified the coordinate system, we can work out the 
details of the kinetic and potential energies, and thus find the 
Lagrangian. The kinetic energy is 


T(t, 0, 6) = 4C6?, (2.75) 


where C is the moment of inertia about the spin axis, and the 
angular velocity of the body about the é@ axis is 6. There is no 
component of angular velocity on the other principal axes. 

To get an explicit expression for the potential energy we must 
write the direction cosines in terms of 0 and f: œ = cos, = 
— cos( — f), B = cos = sin(@ — f), and y = cos ĝe = 0 because 
the ¢ axis is perpendicular to the orbit plane. The potential energy 
is then 


GM M' 
R 


1GM' ; 
Since we are assuming that the orbit is given, we only need to 
keep terms that depend on 9. Expanding the squares of the cosine 
and the sine in terms of the double angles, and dropping all the 


18 Traditionally, the point in the orbit at which the two bodies are closest is 
called the pericenter, and the angle f is called the true anomaly. 
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terms that do not depend on 0 we find the potential energy for 


the orientation!’ 
. 3 GM’ 


A Lagrangian for the model spin-orbit coupling problem is then 
L=T-V: 


3 GM’ 
4 R) 


L(t,0,8) = 5c? + (B — A) cos2(0 — f(t). (2.77) 


We introduce the dimensionless “out-of-roundness” parameter 


3(B — A) 
= 4p 5, 2.78 
c 5 (2.78) 
and use the fact that the orbit frequency n satisfies Kepler’s third 
law n2a? = G(M + M’), which is approximately n?a° = GM" for a 
small body in orbit around a much more massive one (M « M’). 
In terms of e and n the spin-orbit Lagrangian is 


L(t, 0,0) = roi $ o cos 2(0 — f(t). (2.79) 


This is a problem with one degree of freedom with terms that vary 
periodically with time. 

The Lagrange equations are derived in the usual manner. The 
equations are 


CD°6(t) = — sin 2(0(t) — f(t). (2.80) 
The equation of motion is very similar to that of the periodically 
driven pendulum. The main difference here is that not only is the 
strength of the acceleration changing periodically, but in the spin- 
orbit problem the center of attraction is also varying periodically. 

We can give a physical interpretation of this equation of motion. 
It states that the rate of change of the angular momentum is equal 
to the applied torque. The torque on the body arises because the 


The given potential energy differs from the actual potential energy in that 
non-constant terms that do not depend on 0 and consequently do not affect 
the evolution of 0 have been dropped. 
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body is out of round and the gravitational force varies as the 
inverse square of the distance. Thus the force per unit mass on 
the near side of the body is stronger than the acceleration of the 
body as a whole, and the force per unit mass on the far side of 
the body is a little less than the acceleration of the body as a 
whole. Thus, relative to the acceleration of the body as a whole 
the far side is forced outward while the inner part of the body is 
forced inward. The net effect is a torque on the body, which tries 
to align the long axis of the body with the line to the external 
point mass. If @ is a bit larger than f then there is a negative 
torque, and if 0 is a bit smaller than f then there is a positive 
torque, both of which would align the long axis with the planet if 
given a fair chance. The torque arises because of the difference of 
the inverse R? force across the body, so the torque is proportional 
to R. There is only a torque if the body is out-of-round, for 
otherwise there is no handle to pull on. This is reflected in the 
factor B— A, which appears in the expression for the torque. The 
potential depends only on the moment of inertia, thus the body 
has the same dynamics if it is rotated by 180°. The factor of 2 in 
the argument of sine reflects this symmetry. This torque is called 
the “gravity gradient torque.” 

To compute the evolution requires a bunch of detailed prepara- 
tion similar to what has been done for other problems. There are 
many interesting phenomena to explore. We can take parameters 
appropriate for the Moon, and find that Mr. Moon does not con- 
stantly point the same face to the Earth, but instead constantly 
shakes his head in dismay at what goes on here. If we nudge the 
Moon a bit, say by hitting it with an asteroid, we find that the 
long axis oscillates back and forth with respect to the direction 
that points to the Earth. For the Moon, the orbital eccentric- 
ity is currently about 0.05, and the out-of-roundness parameter is 
about e = 0.026. Figure 2.11 shows the angle 0 — f as a function 
of time for two different values of the “lunar” eccentricity. The 
plot spans 50 lunar orbits, or a little under 4 years. This Moon 
has been kicked by a large asteroid and has initial rotational angu- 
lar velocity 0 equal to 1.01 times the orbit frequency. The initial 
orientation is 0 = 0. The smooth trace shows the evolution if 
the orbital eccentricity is set to zero. We see an oscillation with 
a period of about 40 lunar orbit periods or about 3 years. The 
more wiggly trace shows the evolution of 0 — f with an orbital 
eccentricity of 0.05, near the current lunar eccentricity. The lunar 
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Figure 2.11 The angle 0 — f versus time for 50 orbit periods. The 
ordinate scale is +1 radian. The Moon has been kicked so that the initial 
rotational angular velocity is 1.01 times the orbital frequency. The trace 
with fewer wiggles was computed with zero lunar orbital eccentricity; 
the other trace was computed with lunar orbital eccentricity of 0.05. 
The period of the rapid oscillations is the lunar orbit period, and are 
due mostly to the nonuniform motion of f. 


eccentricity superimposes an apparent shaking of the face of the 
moon back and forth with the period of the lunar orbit. Though 
the Moon does slightly change its rate of rotation during the course 
of its orbit, most of this shaking is due to the nonuniform motion 
of the Moon in its elliptical orbit. This oscillation is called the 
“optical libration of the Moon,” and it allows us to see a bit more 
than half the surface of the Moon. The longer period oscillation 
induced by the kick is called the “free libration of the Moon.” It 
is “free” because we are free to excite it by choosing appropriate 
initial conditions. The mismatch of the orientation of the moon 
caused by the optical libration actually produces a periodic torque 
on the Moon, which slightly speeds up and slows down the Moon 
during every orbit. The resulting oscillation is called the “forced 
libration of the Moon,” but it is too small to see in this plot. 
The oscillation period of the free libration is easily calculated. 
We see that the eccentricity of the orbit does not substantially 
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affect the period, so consider the special case of zero eccentricity. 
In this case R = a, a constant, and f(t) = nt where n is the orbital 
frequency (traditionally called the mean motion). The equation 
of motion becomes 


252 
DO) =- 


sin 2(0(t) — nt). (2.81) 


Let y(t) = O(t) — nt, and consequently Dy(t) = Dé@(t) — n, and 
D?y = D6. Substituting these, the equation governing the evo- 
lution of ọ is 


n?e 


Dep = — z Sin 2y. (2.82) 


For small deviations from synchronous rotation (small y) this is 
Dy = —n’7e*y, (2.83) 


so we see that the small amplitude oscillation frequency of y is 
ne. For the Moon, € is about 0.026, so the period is about 1/0.026 
orbit periods or about 40 lunar orbit periods, which is what we 
observed. 

It is perhaps more fun to see what happens if the out-of- 
roundness parameter is large. After our experience with the driven 
pendulum it is no surprise that we find abundant chaos in the spin- 
orbit problem when the system is strongly driven by having large 
c and significant e. There is indeed one body in the solar system 
that exhibits chaotic rotation—Hyperion, a small satellite of Sat- 
urn. Though our model is not adequate for a complete account 
of Hyperion, we can show that our toy model exhibits chaotic be- 
havior for parameters appropriate for Hyperion. We take e = 0.89 
and e = 0.1. Figure 2.12 shows 0 — f for 50 orbits, starting with 
8 = 0 and 6 = 1.05. We see that sometimes one face of the body 
oscillates facing the planet, sometimes the other face oscillates 
facing the planet, and sometimes the body rotates relative to the 
planet in either direction. 

If we were to relax our restriction that the spin axis is fixed per- 
pendicular to the orbit, then we find that the Moon maintains this 
orientation of the spin axis even if nudged a bit, but for Hyperion 
the spin axis almost immediately falls away from this configura- 
tion. The state in which Hyperion on average points one face to 
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Figure 2.12 The angle 0 — f versus time for 50 orbit periods. The 
ordinate scale is +r radian. The out-of-roundness parameter is large 
€ = 0.89, with an orbital eccentricity of e = 0.1. The system is strongly 
driven. The rotation is apparently chaotic. 


Saturn is dynamically unstable to chaotic tumbling. Observations 
of Hyperion have confirmed that Hyperion is chaotically tumbling. 


2.12 Euler’s Equations 


For a free rigid body we have seen that the components of the 
angular momentum on the principal axes comprise a self contained 
dynamical system: the variation of the principal axis components 
depends only on the principal axis components. Here we derive 
equations that govern the evolution of these components. 

The starting point for the derivation is the conservation of the 
vector angular momentum. The components of the angular mo- 
mentum on the principal axes are 


L’ = Tw! (2.84) 
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where w’ is composed of the components of the angular velocity 
vector on the principal axes, and I’ is the matrix representation 
of the inertia tensor with respect to the principal axis basis: 


A 0 0 
Tr=]|0 B 0l. (2.85) 
0 0 C 


The body components of the angular momentum L’ are related to 
the components L on the fixed rectangular basis ê; by 


L = MI’, (2.86) 


where M is the matrix representation of the rotation that carries 
the body and all vectors attached to the body from the reference 
orientation of the body to the actual orientation. 

The vector angular momentum is conserved for free rigid body 
motion, and so are its components on a fixed rectangular basis. 
So, along solution paths 


0= DL = DML'+MDL. (2.87) 
Solving, we find 

DL = -M' DML’. (2.88) 
In terms of w’ this is 


Dw! = —M™ DMT w 
= —M" A(Mw') MT w’, (2.89) 


where we have used equation (2.38) to write DM in terms of A. 
The function A has the property? 


R A(Rv) R = A(v) (2.90) 


Rotating the cross product of two vectors gives the same vector that is 
obtained by taking the cross product of two rotated vectors: R(ū x v) = 


(Ri) x (Rọ). 
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for any vector with components v and any rotation with matrix 
representation R. Using this property of A we find Euler’s equa- 
tions: 


TDW = —A(w’) l'w. (2.91) 


Euler’s equations give the time derivative of the body components 
of the angular velocity vector entirely in terms of the components 
of the angular velocity vector and the principal moments of in- 
ertia. Let w%, w’, and w° denote the components of the angular 
velocity vector on the principal axes. Then Euler’s equations can 
be written as the component equations 


ADuw*® = (B — C) ww! 
B Da? = (C = A) ww" 
C Dw? = (A — B) ww’. (2.92) 


Alternately, we can rewrite Euler’s equations in terms of the 
components of the angular momentum on the principal axes 


DU = —A((V') 1 L’)L’. (2.93) 


These equations confirm that the time derivatives of the com- 
ponents of the angular momentum on the principal axes depend 
only on the components of the angular momentum on the principal 
axes. 

Euler’s equations are very simple, but they do not completely 
determine the evolution of a rigid body—they do not give the spa- 
tial orientation of the body. However, equation (2.38) and prop- 
erty (2.90) can be used to relate the derivative of the orientation 
matrix to the body components of the angular velocity vector: 


DM = MA(w’). (2.94) 


A straightforward method of using these equations is to integrate 
them componentwise as a set of nine first order ordinary differ- 
ential equations, with initial conditions determining the initial 
configuration matrix. Together with Euler’s equations, which de- 
scribe how the body components of the angular velocity vector 
change with time, this system of equations governing the motion 
of a rigid body is complete. However, the reader will no doubt 
have noticed that this approach is rather wasteful. The fact that 
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the orientation matrix can be specified with only three parameters 
has not been taken into account. We should be integrating three 
equations for the orientation, given w’, not nine. To accomplish 
this we once again need to parameterize the configuration matrix. 

For example, we can use Euler angles to parameterize the ori- 
entation: 


M(0, p, p) R.(y)Ri(9)R.(y). (2.95) 


We form M by composing M with an Euler coordinate path. Equa- 
tion (2.94) can then be used to solve for D@, Dy, and Dw. We 
find 


D0 1 cos ysin —sinysinð 0 w’ 
De T | sin cos y% 0 | ja . (2.96) 
Dw —sinycos? coswcosé@ sind] Lw* 


This gives us the desired equation for the orientation. Note that 
it is singular for 0 = 0 as are Lagrange’s equations. So Euler’s 
equations using Euler angles for the configuration have the same 
problem as did the Lagrange equations using Euler angles. Again, 
this is a manifestation of the fact for 0 = 0 the orientation depends 
only on y+. The singularity in the equations of motion for 
0 = 0 does not correspond to anything funny in the motion of the 
rigid body. A practical solution to the singularity problem is to 
choose another set of Euler-like angles that have a singularity in a 
different place, and switch from one to the other when the going 
gets tough. 


Exercise 2.15: 


Fill in the details of the derivation of equation (2.96). You may want to 
use the computer to help with the algebra. 


Euler’s equations for forced rigid bodies 

Euler’s equations were derived for a free rigid body. In general, 
we must be able to deal with external forcing. How do we do 
this? First, we derive expressions for the vector torque. Then we 
include the vector torque in the Euler equations. 

We derive the vector torque in a manner analogous to the 
derivation of the vector angular momentum. That is, we derive 
one component and then argue that since the coordinate system 
is arbitrary, all components have the same form. 
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Suppose we have a rigid body subject to some potential energy 
that depends only on time and the configuration. A Lagrangian 
is L = T — V. If we use the Euler angles as generalized coordi- 
nates, the last of the three active Euler rotations that define the 
orientation is a rotation about the Z axis. The magnitude of this 
rotation is given by the angle y. The Lagrange equation for y 
gives?! 


Dpg(t) = -311V (t; O(t), p(t), Y(t). (2.97) 


If we define T,, the component of the torque about the z axis, to 
be minus the derivative of the potential energy with respect to the 
angle of rotation of the body about the z axis, 


T(t) = 311V (t; A(t), (t), Y(t)), (2.98) 
then we see that 
Dp;(t) = T.(t). (2.99) 


We have already identified the momentum conjugate to y as one 
component, L}, of the vector angular momentum L (see sec- 
tion 2.9), so 


DOS. (2.100) 


Since the orientation of the reference rectangular basis vectors is 
arbitrary we may choose them any way that we please. Thus if we 
want any component of the vector torque, we may choose the z- 
axis so that we can compute it in this way. We can conclude that 
the vector torque gives the rate of change of the vector angular 
momentum 


DL=T. (2.101) 


Having obtained a general prescription for the vector torque, we 
address how the vector torque may be included in Euler’s equa- 
tions. Euler’s equations expressed the fact that the vector angular 


21Tn this equation we have a partial derivative with respect to a component of 
the coordinate argument of the potential energy function. The first subscript 
on the ð symbol indicates the coordinate argument. The second one selects 
the y component. 
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momentum is conserved. Let’s return to that calculation, but now 
include a torque with components T 


DL=T=DMUL'+MDL’. (2.102) 
Carrying out the same steps as before we find 
DU + A(T) "LL! = MT = T', (2.103) 


where the components of the vector torque on the principal axes 
are: 


In terms of w’ this is 
Dw + A(w’) Vo! = T. (2.105) 


In components, 


A Du’ — (B — C) uwt = T° (2.106) 
B Da? — (C — A) w°w® = T’ (2.107) 
C Dw? — (A — B) wtw? = T°. (2.108) 


Note that the torque entered only the equations for the body 
angular momentum or alternately for the body angular velocity 
vector. The equations that relate the derivative of the orientation 
to the angular velocity vector are not modified by the torque. In a 
sense, Euler’s equations contain the dynamics, and the equations 
governing the orientation are kinematic. Of course, Lagrange’s 
equations must be modified by the potential that gives rise to the 
torques; in this sense Lagrange’s equations contain both dynamics 
and kinematics. 


2.13 Nonsingular Generalized Coordinates 


The Euler angles provide a convenient way to parameterize the 
orientation of a rigid body. However, the equations of motion 
derived for them have singularities. Though we can avoid the 
singularities by using other Euler-like combinations with different 
singularities, this kludge is not very satisfying. Let’s brainstorm 
a bit and see if we can come up with something better. 
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What does it take to specify an orientation? Perhaps we can 
take a hint from Euler’s theorem. Recall that Euler’s theorem 
states that any orientation can be reached with a single rotation. 
So one idea to specify the orientation of a body is to parameterize 
this single rotation that does the job. To specify this rotation we 
need to specify the rotation axis and the amount of rotation. We 
contrast this with the Euler angles that specify three successive 
rotations. These three rotations need not have any relation to 
the single composite rotation that gives the orientation. Isn’t it 
curious that the Euler angles make no use of Euler’s theorem? 

We can think of several ways of specifying a rotation. One 
way would be to specify the rotation axis by the latitude and the 
longitude that the rotation axis pierces a sphere. The amount of 
rotation needed to take the body from the reference position could 
be specified by one more angle. We can predict though that this 
choice of coordinates will have similar problems to those of the 
Euler angles: if the amount of rotation is zero, then the latitude 
and longitude of the rotation axis is undefined. So the Lagrange 
equations for these angles are probably singular. Another idea, 
without this defect, is to represent the rotation by the rectangular 
components of an orientation vector 0; we take the direction of the 
orientation vector to be the same as the axis of rotation that takes 
the body from the reference orientation to the present orientation, 
and the length of the orientation vector is the angle by which the 
body must be rotated, in a right-hand sense about the orientation 
vector. With this choice of coordinates, if the angle of rotation 
is zero then the length of the vector is zero and has no unwanted 
direction. This choice looks promising. 

We denote the rectangular components of 0 by (oz, 0y, 0z); these 


are our generalized coordinates. The magnitude o = 4/03, + 07 + 02 
is the angle of rotation. The axis of the rotation is 6 = d/o. We 
denote the components of ô by ôr, dy, and 6,. The first step in 
implementing the components of the orientation vector as gen- 
eralized coordinates is to construct the rotation M to which the 
orientation vector 0 corresponds. Let ù’ be a vector to one of the 
constituents of the body in the reference orientation, and w be the 


vector to that constituent after rotation by M: 


a= Mi. (2.109) 
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We can determine M by considering how the rotation represented 
by 6 affects the vector aw’. The component of w parallel to ð is 
unaffected. The perpendicular component is reduced by the cosine 
of the rotation angle, and a component perpendicular to these 
two is generated that is proportional to the sine of the rotation 
angle and the magnitude of the perpendicular component. Let 
(a)! = (@ - 6)6 and (a@)+ = @ — (@#)!|, then 


a = (a)! + (@)+ coso + ô x (#)t sino. (2.110) 


From this expression we can construct the equivalent rotation ma- 
trix. First define some useful primitive matrices: 


0 —6, Gy 
A = A(6) = | Oz 0 “8, 5 (2.111) 
—ôy Ôr 0 
and 
ô  Ôrôy Ôrôz 
S= | ôsôy 6;  ôyôz |, (2.112) 


with the identity 


10 0 
= [o 1 o|. (2.113) 


0 0 1 


The matrix A is antisymmetric and S is the symmetric outer 
product of the components of ô. The matrix A implements the 
cross product of ô with other vectors, and the matrix S projects 
vectors to the orientation vector. We have the following identities: 


AA=S-I (2.114) 
SS=S (2.115) 
SA =0 (2.116) 
AS =0. (2.117) 


In terms of these matrices, the rotation matrix is 


M = Icoso+ Asino + S(1 — cos o) (2.118) 
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The inverse of a rotation is a rotation about the same axis but 
by the negative of the rotation angle. Thus the inverse of M can 
be written down immediately 


M™! = Icoso— A sino + S(1 — cos 0). (2.119) 


We verify that the inverse of the rotation matrix is the transpose 
of the rotation matrix by recalling that I and S are symmetric 
and A is antisymmetric. 

The computation of the angular velocity vector from MT and 
DM is straightforward, though tedious; the angular velocity vec- 
tor turns out to have a simple form: 


m h bee = Cg (1 =) Do. (2.120) 


The components of the angular velocity vector on the principal 
axes can be found by multiplying the above by M7! = M?: 


; bi . 
pitas — A COS O +s (1 _ sin 2y] Do. (2.121) 
o 
Let 
. i : 
wW- pe A aa E (2.122) 
o o o 
then we have 
w' = W Do. (2.123) 
Solving, we find 
Do = W~ ta. (2.124) 


The matrix W is not an orthogonal matrix, so its inverse is not 
trivial, but we can use the properties of the primitive matrices to 
find it. Suppose we have a matrix of the form 


N=al+bA+cS (2.125) 


that we wish to invert. Let’s guess that the inverse matrix has a 
similar form. 


Nt =d@I1+U0A+dS. (2.126) 
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We wish to find the coefficients a’, b’, and c so that NN! = T. 
We find three conditions on the coefficients 


1 = aa’ — bb'r? (2.127) 
0 = ab! + ba’ (2.128) 
0 = ac’ + ca’ + bb! + ce'r?, (2.129) 


with solution 


a 
—b 
= : 
ee (2.131) 


1 b? — ac 


~~ aè +a2e + ab? + bc (2152) 


We can now invert the matrix W using its representation in terms 
of primitive matrices to find 


osino ) 


w- 1 ( osino 


o 1 
A 2 2. 
e +58( ( 133) 


1 — cos o 
Note that all terms have finite limits as o — 0. There is however 
a new singularity. As o — 27 two of the denominators become 
singular, but there the zeros in the numerators are not strong 
enough to kill the singularity. This is the expected singularity that 
corresponds to the fact that at radius 27 the orientation vector 
corresponds to no rotation, but nevertheless specifies a rotation 
axis. This singularity is easy to avoid. Whenever the orientation 
vector develops a magnitude larger than m simply replace it by 
the equivalent orientation vector 6 — 270. 

We can write the equations governing the evolution of the ori- 
entation as a vector equation in terms of &' = Mtg 


1 
Dé = f(o)&' + 5° x B + g(0)d(d- d") (2.134) 


with two auxiliary functions 


1 xsinz 
= 2 2.135 
21—cosz ( ) 


1-4) 


> (2.136) 
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The equation of motion for the orientation vector is surprisingly 
simple. Both auxiliary functions have finite limits at zero: 


lim f= 
lim g(x) = z. (2.137) 


Orientation vectors with magnitude less than or equal to m are 
enough to specify all orientations, and the equations of motion 
in this region have no singularities. The orientation vector may 
develop magnitudes greater than m but then we replace it by the 
equivalent orientation vector with magnitude less than 7. And 
there is no hurry to do this because the equations are not sin- 
gular until the magnitude reaches 27. Thus we have a complete 
nonsingular specification of the rigid body dynamics. 


A practical matter 
To use the orientation vector we are presented with the practical 
problem of converting between the orientation vector representa- 
tion of the orientation and other representations. We can consider 
the rotation matrix M as an intermediate universal representation. 
Whatever generalized coordinates have been chosen, we must be 
able to compute the rotation matrix to which the coordinates cor- 
respond. We must also solve the converse problem—the determi- 
nation of the generalized coordinates from the rotation matrix. 
We already have the explicit form for the rotation matrix in 
terms of the orientation vector in equation (2.118), repeated here 
for convenience, 


M = Icoso+ Asino + S(1 — cos o). (2.138) 


We can solve the converse problem by examining this same equa- 
tion. We note that of the contributions to M two parts are sym- 
metric and one is antisymmetric. We can isolate the antisymmet- 
ric component by subtracting the transpose. We have 


1 
Asino = zM- M7). (2.139) 
But the matrix A is simply related to the orientation vector 


A=A (2) (2.140) 
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We use the inverse operation A~! that extracts the components 
of a 3-vector from an antisymmetric 3x3 matrix. So we have 


o 

— = Ac (A). (2.141) 

o 

Note that information about the magnitude o of the rotation is 

not available in A by itself. However, the combination of M and 

its transpose produces a scaled version of A from which the mag- 

nitude of the rotation can be recovered 

sin o 1 

e E are (5 (M-™M)). (2.142) 
o 2 

The length of the vector represented by the components on the 

left-hand side is just sino. This does not uniquely determine o, 

because o spans the interval 0 to m. To completely determine o 

and thus 0 we need more information, say by determining cos o. 

We can get coso easily enough. Examination of the components 

shows 


1yl 
coso = 5 | trace (M+M')-1l. (2.143) 


Having determined both the sine and the cosine of o we can de- 
termine o. Of course, some these expressions contain divisions by 
o that may be zero, but if o = 0 then the orientation vector is 
just the zero vector. This completes the solution of the practical 
problem of going to and from the orientation vector. 


Composition of rotations 
We can ask the following question: “To which rotation does the 
composition of two rotations correspond?” Alternatively, “What 
is the algebra of orientation vectors?” We have all the pieces, 
to answer this question is just a matter of computation. Given 
two rotations represented by the rotation matrices Mı and Mo, 
the rotation matrix of the composition of these rotations is M = 
MəM;ı. Each of these rotation matrices can be converted to the 
equivalent orientation vector. We can define the composition ọ = 
02 0 Oj. 

Let a = (sino)/o, 8 = (1 — coso) /o?, and y = coso. By direct 
calculation we find 


1 
ad = 6 fa (2) - (dy - 5) 271) 
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a 1+ Tet spt 
+a {aa ($5) ~ (a 32) } 


—— 4 ans (6 f 62) 


+ (01 x 62) {- (2.144) 


and 


y= FON) +92) 1+ 51 02)? re — (04 -dp)ora2,(2.145) 
which together determine 0. 

Well, the formulas are rather complicated, but it turns out that 
with a little rearrangement they can be made quite simple. Let 
c = cos(o/2) and s = sin(o/2), and define g = (s/o0)6. The vector 
¢ is a scaled version of 6; instead of having the magnitude o as 0 
does, the vector g has the magnitude s = sin(o/2). Notice that 
if o is restricted to magnitudes less than 7 then the magnitude of 
the rotation o can be recovered from the magnitude of g Thus, 
with this restriction, the vector ¢ corresponds to a unique rotation, 
no extra information is needed. Nevertheless it is convenient to 
keep track of the cosine of the half-angle as well as the sine; so 
let q = c = cos(0/2).?? The magnitude of f = s and q = c, so 
q?+q-¢=1. We can reexpress the formulas for the composition 
of two rotations in terms of g and q for each rotation. We have 


T= pi + ah + x g (2.146) 
q = 0g - Ti: Q. (2.147) 


Now that is a significant simplification! The 4-tuple formed from 
q with the three components of g are the components of Hamil- 
ton’s quaternion. We see that the vector part of the quaternion 
that represents an orientation is a scaled version of the orientation 
vector. 

Hamilton discovered an even more elegant way of writing the 
formula for the composition of two rotations. Introduce three 


unit quaternions: i, j, k, such that i? = j? = k? = —1, ij = k, 
jk = i, ki = j, and each of the unit quaternions anticommute: 
ij = —ji, and so on. Denote the three components of qg by 


(q',q°,q°), and q by q°. Then define the composite quaternion 


22-This notation has the potential for great confusion: q is not the magnitude 
of the vector q. Watch out! 
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q = ql + iq! + jq? + kq?. With the rule for how the unit quater- 
nions multiply, the formula for the composition of two rotations 
becomes simply a multiplication. The quaternions generalize the 
idea of complex numbers. In fact they are the only algebraically 
closed field besides complex numbers. The unit quaternions can- 
not be represented simply as real numbers or complex numbers, 
particularly because of their anticommuting properties. However, 
they do have a representation as 2x2 matrices of complex numbers. 
The units are 


rl 0 
14 [5 | (2.148) 
. fO i 
iof o | (2.149) 
. fo —1 
i= fi 0 | (2.150) 
r—z 0 
ko [o ‘| (2.151) 
where the i on the right-hand side is the usual imaginary unit 
i? = —1. These matrices are related to the Pauli spin matrices. 


There are other representations, but this is carrying us too far 
afield. 

If we are faced with the task of composing rotations repeatedly, 
then the quaternions will be a handy intermediate representation. 
The quaternions also have the advantage that we do not need to 
worry about whether the angle of rotation is in the appropriate 
range. However, the equation of motion for the orientation vector 
is simpler than the equation of motion for the quaternion, so we 
will stick with the orientation vector when we need nonsingular 
equations of motion for the orientation. 


Exercise 2.16: Composition 


Verify that the rule for composition of two rotations in terms of the 
orientation vectors (equations 2.144 and 2.144) is equivalent to the rule 
for multiplying two quaternions (equation 2.147). 


Exercise 2.17: Equation of motion 


Find the equation of motion for the orientation quaternion in terms of 
the angular velocity vector. 
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2.14 Summary 


A rigid body is an example of a mechanical system with con- 
straints. Thus, in a sense this chapter on rigid bodies was nothing 
but an extended example of the application of the ideas developed 
in the first chapter. 

We first showed that the kinetic energy for a rigid body sepa- 
rates into a translational kinetic energy and a rotational kinetic 
energy. The center of mass plays a special role in this separation. 

The rotational kinetic energy is simply expressed in terms of 
the inertia tensor and the angular velocity vector. 

One choice for generalized coordinates is the Euler angles. They 
form suitable generalized coordinates, but are otherwise not spe- 
cial or well motivated. 

Having developed the expressions for the kinetic energy that 
take into account the body constraints and expressed the remain- 
ing freedoms of motion in terms of suitable generalized coordi- 
nates, the equations of motion for the free rigid body are just 
Lagrange’s equations. 

The vector angular momentum is conserved if there are no ex- 
ternal torques. The time derivative of the body components of the 
angular momentum can be written entirely in terms of the body 
components of the angular momentum, and the three principal 
moments of inertia. The body components of angular momentum 
form a self-contained dynamical sub-system. 

The Lagrange equations for the Euler angles are singular for 
some Euler angles. Other choices of generalized coordinates like 
the Euler angles have similar problems. Equations of motion for 
the orientation vector are nonsingular. 


2.15 Projects 


Exercise 2.18: Free rigid body 

Write and demonstrate a program that reproduces diagrams like fig- 
ure 2.3. Can you find trajectories that are asymptotic to the unstable 
relative equilibrium on the intermediate principal axis? 


Exercise 2.19: Rotation of mercury 

In the 60’s it was discovered that Mercury has a rotation period that is 
precisely 2/3 times its orbital period. We can see this resonant behavior 
in the spin-orbit model problem, and we can also play with nudging 
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Mercury a bit to see how far off the rotation rate can be and still be 
trapped in this spin-orbit resonance. If the mismatch in angular velocity 
is too great, Mercury’s rotation is no longer resonantly locked to its orbit. 
Set € = 0.026 and e = 0.2. 


a. Write a program for the spin-obit problem so this resonance dynamics 
can be investigated numerically. You will need to know (or, better, 
show!) that f satisfies the equation 


= 221/2 (0 A 
Df =n(1—e2) (=) (2.152) 
with 
a 1+ecosf 


b. Show that the 3:2 resonance is stable by numerically integrating the 
system when the rotation is not exactly in resonance and observing that 
the angle 0 — 3 f oscillates. 


c. Find the range of initial Ê for which this resonance angle oscillates. 


Exercise 2.20: Precession of the equinox 


The Earth spins very nearly about the largest moment of inertia, and the 
spin axis is tilted by about 23° to the orbit normal. There is a gravity- 
gradient torque on the Earth from the Sun that causes the spin-axis of 
the Earth to precess. Investigate this precession in the approximation 
that the orbit of the Earth is circular, and the Earth is axisymmetric. 
Determine the rate of precession in terms of the moments of inertia of 
the Earth. 


3 


Hamiltonian Mechanics 


Numerical experiments are just what their name 
implies: experiments. In describing and evaluating 
them, one should enter the state of mind of the 
experimental physicist, rather than that of the 
mathematician. Numerical experiments cannot be 
used to prove theorems; but, from the physicist’s 
point of view, they do often provide convincing 
evidence for the existence of a phenomenon. We 
will therefore follow an informal, descriptive and 
non-rigorous approach. Briefly stated, our aim will 
be to understand the fundamental properties of 
dynamical systems rather than to prove them. 


Michel Hénon, “Numerical Exploration of 
Hamiltonian Systems,” in Chaotic Behavior of 
Deterministic Systems, (1983). 


The formulation of mechanics with generalized coordinates and 
momenta as dynamical state variables is called the Hamiltonian 
formulation. The Hamiltonian formulation of mechanics is equiva- 
lent to the Lagrangian formulation, however each presents a useful 
point of view. The Lagrangian formulation is especially useful in 
the initial formulation of a system. The Hamiltonian formulation 
is especially useful in understanding the evolution, especially when 
there are symmetries and conserved quantities. 

For each continuous symmetry of a mechanical system there 
is a conserved quantity. If the generalized coordinates can be 
chosen to reflect a symmetry, then, by the Lagrange equations, 
the conjugate momentum is conserved. We have seen that such 
conserved quantities allow us to deduce important properties of 
the motion. For instance, consideration of the energy and angular 
momentum allowed us to deduce that rotation of a free rigid body 
about the axis of intermediate moment of inertia is unstable, and 
that rotation about the other principal axes is stable. For the 
axisymmetric top, we used two conserved momenta to reexpress 
the equations governing the evolution of the tilt angle so that 
they only involve the tilt angle and its derivative. The evolution 
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of the tilt angle can be determined independently and has simply 
periodic solutions. Consideration of the conserved momenta has 
provided key insight. The Hamiltonian formulation is motivated 
by the desire to focus attention on the momenta. 

In the Lagrangian formulation the momenta are, in a sense, sec- 
ondary quantities: the momenta are functions of the state space 
variables, but the evolution of the state space variables depends 
on the state space variables and not on the momenta. To make 
use of any conserved momenta requires fooling around with the 
specific equations. The momenta can be rewritten in terms of the 
coordinates and the velocities, so, locally, we can solve for the 
velocities in terms of the coordinates and momenta. For a given 
mechanical system and given coordinates, the momenta and the 
velocities can be deduced from one another. Thus we can repre- 
sent the dynamical state of the system in terms of the coordinates 
and momenta just as well as with the coordinates and the veloci- 
ties. If we use the coordinates and momenta to represent the state 
and write the associated state derivative in terms of the coordi- 
nates and momenta, then we have a self contained system. This 
formulation of the equations governing the evolution of the system 
has the advantage that if some of the momenta are conserved, the 
remaining equations are immediately simplified. 

The Lagrangian formulation of mechanics has provided the 
means to investigate the motion of complicated mechanical sys- 
tems. We have found that dynamical systems exhibit a bewilder- 
ing variety of possible motions. The motion is sometimes rather 
simple, and sometimes the motion is very complicated. Sometimes 
the evolution depends very sensitively on the initial conditions, 
and sometimes it is insensitive. And sometimes there are orbits 
that maintain resonance relationships with a drive. Consider the 
periodically driven pendulum. The driven pendulum can behave 
more or less as an undriven pendulum with extra wiggles. It can 
move in a strongly chaotic manner. It can move in resonance with 
the drive, oscillating once for every two cycles of the drive, or 
looping around once per drive cycle. Or consider the Moon. The 
Moon rotates synchronously with its orbital motion, always point- 
ing roughly the same face to the Earth. However, Mercury rotates 
three times every two times it circles the Sun, and Hyperion ro- 
tates chaotically. How can we make sense of this? How do we put 
the possible motions of these systems in relation to each other? 
What other motions are possible? The Hamiltonian formulation 
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of dynamics gives us much more than the stated goal of expressing 
the system derivative in terms of potentially conserved quantities. 
The Hamiltonian formulation provides a convenient framework in 
which the possible motions may be placed and understood. We 
will be able to see the range of stable resonance motions, and the 
range of states reached by chaotic trajectories, and discover other 
unsuspected possible motions. The Hamiltonian formulation leads 
to many additional insights. 


3.1 Hamilton’s Equations 


The momenta are given by momentum state functions of the time, 
the coordinates, and the velocities.! Locally we can find inverse 
functions that give the velocities in terms of the time, the co- 
ordinates, and the momenta. We can use this inverse function 
to represent the state in terms of the coordinates and momenta 
rather than the coordinates and velocities. The equations of mo- 
tion when recast in terms of coordinates and momenta are called 
Hamilton’s canonical equations. 

We present three derivations of Hamilton’s equations. The 
first derivation is guided by the strategy outlined above and uses 
nothing more complicated than implicit functions and the chain 
rule. The second derivation first abstracts a key part of the first 
derivation and then applies the more abstract machinery to derive 
Hamilton’s equations. The third uses the action principle. 

Lagrange’s equations give us the time derivative of the momen- 
tum p on a path q 


Dp(t) = ðı L(t, q(t), Da(t)), (3.1) 
where 
p(t) = L(t, a(t), Da(t)). (3.2) 


To eliminate Dq we need to solve equation (3.2) for Dq in terms 
of p. 

Let V be the function that gives the velocities in terms of the 
time, coordinates, and momenta. Defining V is a problem of func- 


Here we restrict our attention to Lagrangians that only depend on the time, 
the coordinates, and the velocities. 
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tional inverses. To prevent confusion we use names for the vari- 
ables that do not have mnemonic significance. Let 


a = ðL(b, c,d), (3.3) 
then V satisfies? 
d= V(b, c,a). (3.4) 


The Lagrange equation (3.1) can be rewritten in terms of p using 


y: 


Dp(t) = AL(t, a(t), V(t, a(t), p(t))). (3.5) 


We can also use V to rewrite equation (3.2) as an equation for Dq 
in terms of t, q and p: 


Da(t) = V(t, a(t), p(t). (3.6) 


Equations (3.5) and (3.6) give the rate of change of q and p along 
realizable paths as functions of t, g, and p along the paths. 

Though fulfilling our goal of expressing the equations of motion 
entirely in terms of coordinates and momenta, we can find a more 
convenient representation. Define the function 


L(t,q,p) = L(t, q, V(t, q, p)), (3.7) 


which is the Lagrangian reexpressed as a function of time, coor- 
dinates, and momenta. For the equations of motion we need 0,L 
evaluated with the appropriate arguments. Consider 


L(t, q, p) = 3 L(t, q, V(t, q, p)) + 32L(t, q, V(t, q, p))31V (t, q, p) 
— Oo L(t, q, Vit, q,P)) + po V(t, gq, Pp), (3.8) 


where we used the chain rule in the first step and the inverse 
property of V in the second step. Introducing the momentum 
selector? P(t,q,p) = p, and using the property 0,P = 0, we have 


L(t, q, V(t, q;P)) = AL(t, q, p) ~~ P(t, q, p) V(t, q, p) 


?The following properties hold: d = V(b,c,ðzL(b,c,d)) and a = 
32L(b, c, V(b, c, a)). 


3P =, 


3.1 Hamilton’s Equations 183 


= ð (L — PV)(t,q, p) 
= —ð H (t,q, p), gg) 


where the Hamiltonian H is defined by* 

H=PV-L. (3.10) 

The Lagrange equation for Dp becomes simply 

Dp(t) = 0, H(t, a(t), p(t). (3.11) 
The equation for Dq can also be written in terms of H. Consider 


OH (t, q,p) = Oo(PV — L)(t, q,p) 
= V(t,q,p) + p02V(t, qd, p) — O2L(t, q, p). (3.12) 


To carry out the derivative of L we write it out in terms of L: 
L(t, q, p) = OnL(t, q, Vit, q, P))O2V(t, q, p) = põV(t, q, p), (3.13) 


using the inverse property of V again. So, putting equations (3.12) 
and (3.13) together, we obtain 


On H(t, q, p) = V(t, q, p). (3.14) 
On paths for which Dq(t) = V(t, q(t), p(t)) we have 
Dq(t) = 02H (t, q(t), p(t)). (3.15) 


Equations (3.11) and (3.15) give the derivatives of the coordi- 
nate and momentum path functions in terms of the time, coordi- 
nates, and momenta. These equations are known as Hamilton’s 
equations: 


Dq(t) = 02H (t, a(t), p(t)) 
Dp(t) = —O: H (t, q(t), p(t)). (3.16) 


The first equation is just a restatement of the relationship of the 
momenta to the velocities in terms of the Hamiltonian and holds 
for any path, whether or not it is a realizable path. The second 
equation holds only for realizable paths. 


“The overall minus sign in the definition of the Hamiltonian is traditional. 
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Hamilton’s equations? have an especially simple and symmet- 
rical form. Just as Lagrange’s equations are constructed from 
a real-valued function, the Lagrangian, Hamilton’s equations are 
constructed from a real-valued function, the Hamiltonian. The 
Hamiltonian function is® 


H(t, q, p) = pV(t, q, p) = L(t,q, V(t, q, p)). (3.17) 


The Hamiltonian has the same value as the energy function E (see 
equation 1.140), except that the velocities are expressed in terms 
of time, coordinates, and momenta by V: 


H(t, q, p) = E(t, q, V(t, q, p)). (3.18) 


Illustration 
Let’s try something simple: the motion of a particle of mass m 
with potential energy V (x,y). A Lagrangian is 


L(t; £, Y; Ve, vy) = 5m(vz + v?) — V (x,y). (3.19) 


To form the Hamiltonian we first find the momenta p = 02L(t, q, v): 
Pr = Mvz and py = mvy. Solving for the velocities in terms of the 
momenta is easy here: vz = pr/m and vy = py/m. The Hamilto- 
nian is H(t,q,p) = pv — L(t,q,v) with v reexpressed in terms of 


(t,q,p): 


pa + pe 
2m 


H(t; £, Y; Px, Py) = +V (x,y). (3.20) 


5In traditional notation Hamilton’s equations are written: 


dq OH dp OH 

sak erasers d Ee 

dt Op a dt Oq’ 

or as separate equations for each component: 
dq = on and dp; = nen: 

dt Op: dt Og 


° Traditionally, the Hamiltonian is written 


This way of writing the Hamiltonian confuses the values of functions with 
the functions that generate them: both q and L have to be reexpressed as 
functions of the time, coordinates and momenta. 
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The kinetic energy is a homogeneous quadratic form in the veloc- 
ities, so the energy is T + V and the Hamiltonian is the energy 
expressed in terms of momenta rather than velocities. Hamilton’s 
equations for Dq are 


Dx = p/m 
Dy = py/m. (3.21) 


Note that on paths, where vs = Dg and vy = Dy, these just re- 
state the relation between the momenta and the velocities. Hamil- 
ton’s equations for Dp are 


Dpx = -oV (x, y) 
Dpy = —ô V (x,y). (3.22) 


The rate of change of the linear momentum is minus the gradient 
of the potential energy. 


Exercise 3.1: Deriving Hamilton’s equations 

For each of the following Lagrangians derive the Hamiltonian and Hamil- 
ton’s equations. These problems are simple enough to do by hand. 

a. A Lagrangian for a planar pendulum is L(t, 0,0) = tml?6?+mgl cos @. 
b. A Lagrangian for a particle of mass m with a two dimensional po- 
tential energy V(a,y) = (x? + y?)/2 + £?°y — y?/3 is L(t;2,y; 2,9) = 
gz? + y?) — V (x,y). 

c. A Lagrangian for a particle of mass m constrained to move on a 
sphere of radius R is L(t; 0, p; 0, $) = $mR?(6? + (ġsin0)?), where @ is 
the colatitude and y is the longitude on the sphere. 


Exercise 3.2: Sliding pendulum 


For the pendulum with a sliding support (see exercise 1.20) derive a 
Hamiltonian and Hamilton’s equations. 


Hamiltonian state 

Given a coordinate path q, and a Lagrangian L, the corresponding 
momentum path p is given by equation (3.2). Equation (3.15) ex- 
presses the same relationship in terms of the corresponding Hamil- 
tonian H. That these relations are valid for any path, whether 
or not it is a realizable path, allows us to abstract to arbitrary 
velocity and momentum at a moment. At a moment, the mo- 
mentum p for the state tuple (t,q,v) is p = 02L(t,q,v). We also 
have v = 02H (t,q,p). In the Lagrangian formulation the state 
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of the system at a moment can be specified by the local state 
tuple (t,q,v) of time, generalized coordinates, and generalized 
velocities. Lagrange’s equations determine a unique path ema- 
nating from this state. In the Hamiltonian formulation the state 
can be specified by the tuple (t,q,p) of time, generalized coordi- 
nates, and generalized momenta. Hamilton’s equations determine 
a unique path emanating from this state. The Lagrangian state 
tuple (t, q,v) encodes exactly the same information as the Hamil- 
tonian state tuple (t, q, p); we need a Lagrangian or a Hamiltonian 
to relate them. The two formulations are equivalent in that for 
equivalent initial states the same coordinate path emanates from 
them. 

The Lagrangian state derivative is constructed from the La- 
grange equations by solving for the highest order derivative and 
abstracting to arbitrary positions and velocities at a moment.’ 
The Lagrangian state path is generated by integration of the La- 
grangian state derivative given an initial Lagrangian state (t, q, v). 
Similarly, the Hamiltonian state derivative can be constructed 
from Hamilton’s equations by abstracting to arbitrary positions 
and momenta at a moment. Hamilton’s equations are a set of 
first-order differential equations in explicit form. The Hamilto- 
nian state derivative can be directly written in terms of them. The 
Hamiltonian state path is generated by integration of the Hamilto- 
nian state derivative given an initial Hamiltonian state (t, q, p). If 
these state paths are obtained by integrating the state derivatives 
with equivalent initial states, then the coordinate path compo- 
nent of these state paths are the same and satisfy the Lagrange 
equations. The coordinate path and the momentum path compo- 
nents of the Hamiltonian state path satisfy Hamilton’s equations. 
The Hamiltonian formulation and the Lagrangian formulation are 
equivalent. 

Given a path q the Lagrangian state path and the Hamiltonian 
state paths can be deduced from it. The Lagrangian state path 


"In the construction of the Lagrangian state derivative from the Lagrange 
equations we must solve for the highest order derivative. The solution process 
requires the inversion of the matrix 0202L. In the construction of Hamilton’s 
equations, the construction of V from the momentum state function 02L re- 
quires the inversion of the same matrix. If the Lagrangian formulation has 
singularities, the singularities cannot be avoided by going to the Hamiltonian 
formulation. 
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T'[q] can be constructed from a path q simply by taking derivatives. 
The Lagrangian state path satisfies: 


Tla] (t) = (t, a(t), Da(t)) . (3.23) 


The Lagrangian state path is uniquely determined by the path q. 
The Hamiltonian state path Iz[q] can also be constructed from 
the path q but the construction requires a Lagrangian. The Hamil- 
tonian state path satisfies 


Hri) = (t, a(t), 2L(t, a(t), Dalt))) = (t, a6), p). (3.24) 


The Hamiltonian state tuple is not uniquely determined by the 
path q because it depends upon our choice of Lagrangian, which 
is not unique. 

The 2n-dimensional space whose elements are labeled by the 
n generalized coordinates g’ and the n generalized momenta p; is 
called phase space. The components of the generalized coordinates 
and momenta are collectively called the phase-space components.® 
The dynamical state of the system is completely specified by the 
phase state tuple (t,q,p), given a Lagrangian or Hamiltonian to 
provide the map between velocities and momenta. 


Computing Hamilton’s equations 

Hamilton’s equations are a system of first order differential equa- 
tions. We presented a procedural formulation of Lagrange’s equa- 
tions as a first order system in section 1.7. The following formu- 
lation of Hamilton’s equations is analogous: 


SThe term phase space was introduced by Josiah Willard Gibbs in his for- 
mulation of statistical mechanics. The Hamiltonian plays a fundamental role 
in the Boltzmann-Gibbs formulation of statistical mechanics, and in both the 
Heisenberg and Schrödinger approaches to quantum mechanics. 

The momentum p can be viewed as the coordinate representation of a linear 
form on the tangent space. Thus pq is a scalar quantity, which is invariant 
under time-independent coordinate transformations of the configuration space. 
The set of momentum forms comprise an n-dimensional vector space at each 
point of configuration space called the cotangent space. The collection of all 
cotangent spaces of a configuration space forms a space called the cotangent 
bundle of the configuration manifold. 
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(define ((Hamilton-equations Hamiltonian) q p) 
(let ((H-state-path (qp->H-state-path q p))) 
(- (D H-state-path) 
(compose (phase-space-derivative Hamiltonian) 
H-state-path) ))) 


The Hamiltonian state derivative is computed as follows: 


(define ((phase-space-derivative Hamiltonian) H-state) 
(up 1 
(((partial 2) Hamiltonian) H-state) 
(- (((partial 1) Hamiltonian) H-state)))) 


The state in the Hamiltonian formulation is composed of the time, 
the coordinates, and the momenta. We call this an H-state, to dis- 
tinguish it from the state in the Lagrangian formulation. We can 
select the components of the Hamiltonian state with the selectors 
time, coordinate, momentum. We construct Hamiltonian states 
from their components with up. The first component of the state 
is time, so the first component of the state derivative is one, the 
time rate of change of time. Given procedures q and p implement- 
ing coordinate and momentum path functions, the Hamiltonian 
state path can be constructed with the following procedure: 


(define ((qp->H-state-path q p) t) 
(up t (q t) (p t))) 


The Hamilton-equations procedure returns the residuals of Hamil- 
ton’s equations for the given paths. 

For example, a procedure implementing the Hamiltonian for a 
point mass with potential energy V(z, y) is 


(define ((H-rectangular m V) H-state) 
(let ((q (coordinate H-state)) 
(p (momentum H-state))) 
(+ (/ (square p) (* 2 m)) 
(V (ref q 0) (ref q 1))))) 


Hamilton’s equations are:? 


°By default literal functions map reals to reals; the default type for a lit- 
eral function is (-> Real Real). Here, the potential energy V takes two real 
arguments and returns a real. 
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(show-expression 
(((Hamilton-equations 
(H-rectangular 
m 
(literal-function ’V (-> (X Real Real) Real)))) 
(up (literal-function ’x) (literal-function ’y)) 
(down (literal-function ’p_x) (literal-function ’p-_y))) 


t 
0 
De(t) — 2 (t) 
Dy (t) — * 2 


m 


Dps (t) + 3V (z (t) y (t)) 


Dpy (t) + AV (z (t) y (t)) 


The zero in first element of the structure of Hamilton’s equations 
residuals is just the tautology that time advances uniformly: that 
the time function is just the identity, so its derivative is 1 and the 
residual is zero. The equations in the second element relates the 
coordinate paths and the momentum paths. The equations in the 
third element give the rate of change of the momenta in terms of 
the applied forces. 


Exercise 3.3: Computing Hamilton’s equations 


Check your answers to exercise 3.1 with the Hamilton equations proce- 
dures. 


3.1.1 The Legendre Transformation 


The Legendre transformation abstracts a key part of the process 
of transforming from the Lagrangian to the Hamiltonian formula- 
tion of mechanics—the replacement of functional dependence on 
generalized velocities with functional dependence on generalized 
momenta. The momentum state function is defined as a partial 
derivative of the Lagrangian, a real-valued function of time, co- 
ordinates, and velocities. The Legendre transformation provides 
an inverse that gives the velocities in terms of the momenta: we 
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are able to write the velocities as a partial derivative of a different 
real-valued function of time, coordinates, and momenta.!? 

Given a real-valued function F, if we can find a real-valued 
function G, such that DF = (DG)~! then we say that F and G 
are related by a Legendre transform. 

Locally we can define the inverse function!! V of DF so that 
DF o V = I, where I is the identity function I(w) = w. Consider 
the composite function F = Fo V. The derivative of F is 


DF = (DF 0 V)DY = IDV. (3.25) 


Using the product rule and DI = 1, 


DF = D(IV) - V, (3.26) 
V = D(IV) — DF = D(IV — F). (3.27) 


The integral is determined up to a constant of integration. If we 
define 


G=IV-F, (3.28) 
then we have 
V = DG. (3.29) 


The function G has the desired property that DG is the inverse 
function V of DF. The derivation just given applies equally well 
if the arguments of F and G have multiple components. 

Given a relation w = DF (v) for some given function F, then 
v = DG(w) for G = IV — F o VY, where Y is the inverse function 
of DF provided it exists. 

A picture may help (see figure 3.1). The curve is the graph 
of the function DF. Turned sideways, it is also the graph of the 
function DG, because DG is the inverse function of DF. The 
integral of DF from vp to v is F(v) — F (vo); this is the area below 


10The Legendre transformation is more general than its use in mechanics in 
that it captures the relationship between conjugate variables in systems as 
diverse as thermodynamics, circuits, and field theory. 


11This can be done so long as the derivative is not zero. 
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Figure 3.1 The Legendre transform can be interpreted in terms of 
geometric areas. The curve is the graph of DF, and viewed sideways is 
the graph of DG = DF~'. This figure should remind you of the geo- 
metric interpretation of the product rule for derivatives, or alternatively, 
integration by parts. 


the curve from vp to v. Likewise the integral of DG from wọ to w 
is G(w) — G(wo); this is the area to the left of the curve from wo 
to w. The union of these two regions has the area wv — wovo. So 


wv — wovo = F(v) — F(vo) + G(w) — G(wo), (3.30) 
which is the same as 
wu — F(v) — G(w) = wovo — G(wo) — F (vo). (3.31) 


The left-hand side depends on the point labeled by w and v and 
the right-hand side depends on the point labeled by wọ and vo, so 
these can only be equal to a constant, independent of the variable 
endpoints. As the point is changed the combination G(w)+ F(v)— 
wv is invariant. So 


G(w) = wv — F(v) +C, (3.32) 
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with constant C. The requirement for Œ depends only on DG so 
we can choose to define G with C = 0. 


Legendre transformations with passive arguments 

Let F be a real-valued function of two arguments, and 

w = ð F(x, v). (3.33) 
If we can find a real-valued function G such that 


v = G(x, w) (3.34) 


we say that F and G are related by a Legendre transformation, 
and that the second argument in each function is active and that 
the first argument is passive in the transformation. 

If the function F can be locally inverted with respect to the 
second argument we can define 


v = V(x, w), (3.35) 
giving 
w = ð F(x, V(x, w)) = W(x, w), (3.36) 


where W = J, is the selector function for the second argument. 

For the active arguments the derivation goes through as before. 
The first argument to F and G is just along for the ride, it is a 
passive argument. Let 


F(z, w) = F(x, V(x, w)), (3.37) 
then define 
G=WV-F. (3.38) 


We can check that G has the property V = 1G by carrying out 
the derivative: 


AG = ð (WY — F) 
=V + WaY — ð, F, (3.39) 


but 


OF (a, w) = OF (x, V(x, w))O1V(a, w) 
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=W(a,w)Ar1V(z,w), (3.40) 
or 
OF =Way. (3.41) 
So 
aG =V, (3.42) 


as required. The active argument may have many components. 

The partial derivatives with respect to the passive arguments 
are related in a remarkably simple way. Let’s calculate the deriva- 
tive OoG in pieces. First, 


Oo(WYV) = WV (3.43) 
because oW = 0. To calculate OoF we must supply arguments 


oF (ax, w) = OF (x, V(x,w)) + F(a, V(x, w))OoV(a, w) 
= OoF (x, V(x, w)) + W (z, w)dO0V(az, w). (3.44) 


Putting these together we find 
G(x, w) = —OoF (x, V(x, w)) = F (a, v). (3.45) 


The calculation is unchanged if the passive argument has many 
components. 
We can write the Legendre transformation more symmetrically: 


w = ð F(x, v) 
wu = F(x,v) + G(x, w) 
v = ð G(x, w) 
0 = OoF (x, v) + G(x, w). (3.46) 


The last relation is not as trivial as it looks, because x enters the 
equations connecting w and v. With this symmetrical form, we 
see that the Legendre transform is its own inverse. 


Exercise 3.4: Simple Legendre transforms 


For each of the following functions find the function that is related to 
the given function by the Legendre transform on the indicated active 
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argument. Show that the Legendre transform relations hold for your 
solution, including the relations among passive arguments, if any. 

a. F(x) = asin z + bcos x, there are no passive arguments. 

b. F(x,y) = asinx cosy, with x active. 


c. F(x,y, 2,9) = xt? + 32y + yy’, with t and y active. 


Hamilton’s equations from the Legendre transformation 
We can use the Legendre transformation with the Lagrangian 
playing the role of F and with the generalized velocity slot playing 
the role of the active argument. The Hamiltonian plays the role 
of G with the momentum slot active. The coordinate and time 
slots are passive arguments. 

The Lagrangian L and the Hamiltonian H are related by a 
Legendre transformation: 


e = (AL)(a, b,c) (3.47) 
ec = L(a,b,c) + H(a,b,e) (3.48) 
and 

c = (nH)(a,b,e), (3.49) 


with passive equations 


0 = OL(a, b,c) + oH (a,b, e), (3.50) 
0 = 0, L(a, b,c) + ôH (a,b, e). (3.51) 


Presuming it exists, we can define the inverse of 0.2L with respect 
to the last argument 


c= V(a,b,e), (3.52) 
and write the Hamiltonian 
H(a, b,c) = cV (a,b,c) — L(a,b, V(a, b,c)). (3.53) 


These relations are purely algebraic in nature. 
On a path q we have the momentum p: 


p(t) = A2L(t, q(t), Da(t)), (3.54) 
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and from the definition of V 


Daq(t) = V(t, q(t), p(t). (3.55) 
The Legendre transform gives 
Dq(t) = 02.H(t, q(t), p(t)). (3.56) 


This relation is purely algebraic and is valid for any path. The 
passive equation (3.51) gives 


A, L(t, g(t), Da(t)) = —O. H(t, q(t), p(t), (3.57) 


but the left-hand side can be rewritten using the Lagrange equa- 
tions, so 


Dp(t) = —O, H(t, q(t), p(t)). (3.58) 


This equation is only valid for realizable paths, because we used 
the Lagrange equations to derive it. Equations (3.56) and (3.58) 
are Hamilton’s equations. 

The remaining passive equation is 


OoL(t, q(t), Da(t)) = -30H (t, a(t), p(t). (3.59) 


We have found that if the Lagrangian has no explicit time de- 
pendence (oL = 0) then energy is conserved. This passive equa- 
tion says that if the Lagrangian has no explicit time dependence 
then the Hamiltonian will also have no explicit time dependence 
(OoH = 0). So if the Hamiltonian has no explicit time dependence 
then it is a conserved quantity. 


Exercise 3.5: 


Using Hamilton’s equations, show directly that the Hamiltonian is a 
conserved quantity if the Hamiltonian has no explicit time dependence. 


Legendre transforms of quadratic functions 

We cannot implement the Legendre transform in general because 
it involves finding the functional inverse of an arbitrary function. 
However, many physical systems can be described by Lagrangians 
that are quadratic forms in the generalized velocities. For such 
functions the generalized momenta are linear functions of the gen- 
eralized velocities, and thus explicitly invertible. 
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More generally, we can compute a Legendre transformation for 
polynomial functions where the leading term is a quadratic form: 


1 
F(v) = gM +bu+e. (3.60) 


We can assume M is symmetric,!? because it defines a quadratic 
form. We can find linear expressions for w 


w= DF(v)=vM +b. (3.61) 


So if M is invertible we can solve for v in terms of w. Thus we 
may define a function Y such that 


v = V(w) = M~t (w — b) (3.62) 


G(w) = wV(w) — F(V(w)). (3.63) 


Computing Hamiltonians 
We implement the Legendre transform for quadratic functions by 
the procedure:!? 


(define (Legendre-transform F) 
(let ((w-of-v (D F))) 
(define (G w) 
(let ((z (dual-zero w))) 
(let ((M ((D w-of-v) z)) 
(b (w-of-v z))) 
(let ((v (/ (- w b) M))) 
(- (xwv) (F v)))))) 
G)) 


The procedure Legendre-transform takes a procedure of one ar- 
gument and returns the procedure that is associated with it by 
the Legendre transform. If w = DF(v), wv = F(v) + G(w), and 
v = DG(w) specifies a one argument Legendre transformation, 


Tet M be the matrix representation of M, then M = M". 


13The division operation, denoted by / in the Legendre-transform procedure, 
is generic over mathematical objects. We interpret the division in the matrix 
representation: if a vector y is divided by a matrix M this is interpreted as a 
request to solve the linear system Mx = y, where x is the unknown vector. 
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then G is the function associated with F by the Legendre trans- 
form: G = IV — F o VY, where Y is the functional inverse of DF. 

We can use the Legendre-transform procedure to compute a 
Hamiltonian from a Lagrangian 


(define ((Lagrangian->Hamiltonian Lagrangian) H-state) 
(let ((t (time H-state)) 
(q (coordinate H-state) ) 
(p (momentum H-state))) 
(define (L qdot) 
(Lagrangian (up t q qdot))) 
((Legendre-transform L) p))) 


Notice that the one-argument Legendre-transform procedure is 
sufficient. The passive variables are given no special attention, 
they are just passed around. 

The Lagrangian may be obtained from the Hamiltonian by the 
procedure: 


(define ((Hamiltonian->Lagrangian Hamiltonian) L-state) 
(let ((t (time L-state)) 
(q (coordinate L-state)) 
(qdot (velocity L-state))) 
(define (H p) 
(Hamiltonian (up t q p))) 
((Legendre-transform H) qdot))) 


Notice that the two procedures Hamiltonian->Lagrangian and 
Lagrangian->Hamiltonian are identical, except for the names. 

For example, the Hamiltonian for the motion of the point mass 
with the potential energy V (x,y) may be computed from the La- 
grangian: 


(define ((L-rectangular m V) local) 
(let ((q (coordinate local)) 
(qdot (velocity local))) 
(- (* 1/2 m (square qdot)) 
(V (ref q 0) (ref q 1))))) 


And the Hamiltonian is: 
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(show-expression 
((Lagrangian->Hamiltonian 
(L-rectangular 
’m 


(literal-function ’V (-> (X Real Real) Real)))) 


(up ’t (up ’x ’y) (down ’p-x ’p-y)))) 


1,2 M2 
5 5P. 
V (x,y) + 2 + 2 
m m 


Exercise 3.6: On a helical track 


A uniform cylinder of mass M, radius R, and height h is mounted so as 
to rotate freely on a vertical axis. A mass point of mass m is constrained 
to move on a uniform frictionless helical track of pitch @ (measured in 
radians per meter of drop along the cylinder) mounted on the surface 
of the cylinder (see figure 3.2). The mass is acted upon by standard 


gravity (g = 9.8ms7?). 


Figure 3.2 


a. What are the degrees of freedom of this system? Pick and describe 
a convenient set of generalized coordinates for this problem. Write a 
Lagrangian to describe the dynamical behavior. It may help to know 
that the moment of inertia of the cylinder around its axis is 3M R?. You 
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may find it easier to do the algebra if various constants are combined 
and represented as single symbols. 


b. Make a Hamiltonian for the system. Write Hamilton’s equations for 
the system. Are there any conserved quantities? 


c. If we release the mass point at time t = 0 at the top of the track 
with zero initial speed and let it slide down, what is the motion of the 
system? 


Exercise 3.7: An ellipsoidal bowl 


Consider a point particle of mass m constrained to move in a bowl 
and acted upon by a uniform gravitational acceleration g. The bowl 
is ellipsoidal, with height z = ax? + by?. Make a Hamiltonian for this 
system. Are there any immediate deductions you can make about this 
system? 


3.1.2 Hamiltonian Action Principle 


The previous two derivations of Hamilton’s equations have made 
use of the Lagrange equations. Hamilton’s equations can also be 
derived directly from the action principle. 

The action is the integral of the Lagrangian along a path: 


sate 1 “paria. (3.64) 


The action is stationary with respect to variations of the path that 
preserve the configuration at the endpoints (for Lagrangians that 
are functions of time, coordinates, and velocities). 

We can rewrite the integrand in terms of the Hamiltonian 


L o Tq] (t) = p(t) Da(t) — H(t, a(t), p@), (3.65) 


with p(t) = 02L(t, q(t), Dq(t)). The Legendre transformation con- 
struction gives 


Dq(t) = 02H (t, q(t), p(t), (3.66) 


which is one of Hamilton’s equations, the one that does not depend 
on the path being a realizable path. Using 


Tr [al(t) = (t, a(t), O2L(t, a(t), Da(t))) = (t, a(t), p(t), (3.67) 
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the integrand is 

L oT|q] = pDq — H o Iz jq]. (3.68) 
The variation of the action is then 

6S[q] (tı, t2) 


tg 
= 6(pDq — H oIIz[q]) 
ty 


i f "(Sp Dq+ p ôDq — (DH o Tz (q)) 51 (a)) 


t2 
= {dp Dq + p Dog 
ty 


—(0,A ol, [q])oq — (32H oT z[q])op}, — (3.69) 


where 6p is the variation in the momentum.!* Integrating the 
second term by parts, using D(pdq) = Dpdq + pDogq, we get 


65[q](t1, t2) = poq\e 
t2 
+ | {dp Dq — Dp ôq 
ti 


—(O1F o Iz [g])ôqg — (02H o Iz [g] )őp} . (3.70) 


The variations are constrained so that dq(t1) = ôq(t2) = 0, so the 
integrated part vanishes. The variation of the action is 


Sla] (ti, t2) (3.71) 


= f} (Da- dH o Tifa) 5p — (Dp + 1H oT) 5a). 


14The variation of the momentum dp does not need to be further expanded in 
this argument because it turns out that the factor multiplying it is zero. How- 
ever, it is handy to see how it is related to the variations in the coordinate 
path ôq: 


dp(t) = 8132 L(t, q(t), Da(t))dq(t) + d202L(t, a(t), Da(t))Dôq(t). 
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As a consequence of equation (3.66), the factor multiplying dp is 
zero. We are left with 


ssid) = f ” (Dp + AH oTzlal) ba. (3.72) 


ty 


For the variation of the action to be zero for arbitrary variations, 
except for the endpoint conditions, we must have 


Dp = —0,H oI; [q], (3.73) 


which is the “dynamical” Hamilton’s equation.!° 


3.1.3 A Wiring Diagram 


Figure 3.3 shows a summary of the functional relationship between 
the Lagrangian and the Hamiltonian descriptions of a dynami- 
cal system. The diagram shows a “circuit” interconnecting some 
“devices” with “wires”. The devices represent the mathematical 
functions that relate the quantities on their terminals. The wires 
represent identifications of the quantities on the terminals that 
they connect. For example, there is a box that represents the 
Lagrangian function. Given values t, q, and g the value of the 
Lagrangian L(t, q,q) is on the terminal labeled L, which is wired 
to an addend terminal of an adder. There are other terminals of 
the Lagrangian that carry the values of the partial derivatives of 
the Lagrangian function. 

The upper part of the diagram summarizes the relationship of 
the Hamiltonian to the Lagrangian. For example, the sum of the 
values on the terminals L of the Lagrangian and H of the Hamilto- 
nian is the product of the value on the q terminal of the Lagrangian 
and the value on the p terminal of the Hamiltonian. This is the 
active part of the Legendre transform. The passive variables are 
related by the corresponding partial derivatives being negations 
of each other. In the lower part of the diagram the equations of 


Tt is sometimes asserted that the momenta have a different status in the 
Lagrangian and Hamiltonian formulations; that in the Hamiltonian framework 
the momenta are “independent” of the coordinates. From this it is argued that 
the variations dq and dp are arbitrary and independent, therefore implying 
that the factor multiplying each of them in the action integral (3.72) must 
independently be zero, apparently deriving both of Hamilton’s equations. The 
argument is fallacious: we can write dp in terms of ôq (see footnote 14). 
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Figure 3.3 This is a “wiring diagram” describing the relationships 
among the dynamical quantities occurring in Lagrangian and Hamilto- 
nian mechanics. 


3.2 Poisson Brackets 208 


motion are indicated by the presence of the integrators, relating 
the dynamical quantities to their time derivatives. 

One can use this diagram to help understand the underlying 
unity of the Lagrangian and Hamiltonian formulations of mechan- 
ics. Lagrange’s equations are just the connection of the p wire to 
the 0,L terminal of the Lagrangian device. One of Hamilton’s 
equations is just the connection of the p wire (through the nega- 
tion device) to the 0, H terminal of the Hamiltonian device. The 
other is just the connection of the ġ wire to the 02H terminal of 
the Hamiltonian device. We see that the two formulations are 
consistent. One does not have to abandon any part of the La- 
grangian formulation to use the Hamiltonian formulation: there 
are deductions that can be made using both simultaneously. 


3.2 Poisson Brackets 


Here we introduce the Poisson bracket. In terms of the Poisson 
bracket Hamilton’s equations have an elegant and symmetric ex- 
pression. 

Consider a function F of time, coordinates, and momenta. The 
value of F along the path o(t) = (t, q(t), p(t)) is (F o o)(t) = 
F(t, q(t), p(t)). The time derivative of F o ø is 


D(F o o) = (DF o o)Do 
= œF o o + (1 F o o) Dq + (02.F o o) Dp. (3.74) 


If the phase-space path is a realizable path for a system with 
Hamiltonian H, then Dq and Dp can be reexpressed using Hamil- 
ton’s equations 


D(F o0) = æF 0 o + (F 00)(0H 00) — (0,F o o) (ôi H o0 0) 


= QF o0 + (0, F'02H — 02F0,H) oo 
=OFoo+{F,H}oo (3.75) 
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where the Poisson bracket {F, H} of F and H is defined by'® 
{F, H} = 0; FOoH — &FO,H. (3.76) 


Note that the Poisson bracket of two functions on the phase state 
space is also a function on the phase state space. 

The coordinate selector Q = J, is an example of a function on 
phase state space: Q(t, q, p) = q. According to equation (3.75) 


Dq = D(Q 0 0) = {Q, H} 0 0 =H og, (3.77) 
but this is the same as Hamilton’s equation 


Daq(t) = 02.H(t, q(t), p(t)). (3.78) 


Similarly, the momentum selector P = I> is a function on phase 
state space: P(t,q,p) = p. We have 


Dp = D(P o0 o) = {P, H} o o = -ôH op, (3.79) 
which is the same as the other Hamilton’s equation 


Dp(t) = —O, H(t, q(t), p(t)). (3.80) 


So the Poisson bracket provides a uniform way of writing Hamil- 
ton’s equations: 


D(Qo00)=1{Q,H} 00 
D(Poo)={P, H}oo. (3.81) 


The Poisson bracket of any function with itself is zero, so we 


recover the conservation of energy for a system that has no explicit 
time dependence: 


DE = D(H 0 0) = (dH + {H, H}) o0 o =H oo. (3.82) 


Properties of the Poisson bracket 
Let F, G, and H be functions of time, position, and momentum, 
and c is independent of position and momentum. 


16Tn traditional notation the Poisson bracket is written 


OF ƏH OF 0H 
eee 2 on Op; = Opi re i 


i 
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The Poisson bracket is antisymmetric: 
{F,G} = -{G, F}. (3.83) 
It is bilinear (linear in each argument): 


{F,G+ H}={F,G}+{F, H} 
{F, cG} = c{F,G} 

{F+G,H}={F,H}+{G,H} 
{cF,G} =c{F,G}. 


The Poisson bracket satisfies Jacobi’s identity: 
0 = {F,{G, H}} + {H, {F,G}} +{G,{H, Fh}, (3.88) 


where all but the last can be immediately verified from the def- 
inition. Jacobi’s identity requires a little more effort to verify. 
We can use the computer to avoid this work. Define some literal 
phase-space functions of Hamiltonian type: 


(define F 
(literal-function ’F 
(-> (UP Real (UP Real Real) (DOWN Real Real)) Real))) 


(define G 
(literal-function ’G 
(-> (UP Real (UP Real Real) (DOWN Real Real)) Real))) 


(define H 
(literal-function ’H 
(-> (UP Real (UP Real Real) (DOWN Real Real)) Real))) 


Then we check the Jacobi identity: 


(pe ((+ (Poisson-bracket F (Poisson-bracket G H)) 
(Poisson-bracket G (Poisson-bracket H F)) 
(Poisson-bracket H (Poisson-bracket F G))) 

(up °t (up ’x ’y) (down ’px ’py)))) 

0) 


The residual is zero, so the Jacobi identity is satisfied for any three 
phase space functions for two degrees of freedom. 
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Poisson brackets of conserved quantities 

The Poisson bracket of conserved quantities is conserved. Let F 
and G be time independent functions on the phase state space: 
oF = ðG = 0. If F and G are conserved by the evolution under 
H then 


0= D(Fo0)={F,H}00 
0= D(Goo) = {G, Hh oo. (3.89) 


So the Poisson brackets of F and G with H are zero: {F,H} = 
{G, H} = 0. The Jacobi identity then implies 


{{F, G}, H} =0, (3.90) 
and thus 
D({F,G} oa) =0, (3.91) 


so {F,G} is a conserved quantity. The Poisson bracket of two 
conserved quantities is also a conserved quantity. 


3.3 One Degree of Freedom 


The solutions of time-independent systems with one degree of free- 
dom can be found by quadrature. Such systems conserve the 
Hamiltonian: the Hamiltonian has a constant value on each re- 
alizable trajectory. We can use this constraint to eliminate the 
momentum in favor of the coordinate. Thus Hamilton’s equations 
reduce to a single equation Dq = f(q). The solution q can be 
expressed as a definite integral. 

A geometric view reveals more structure. Time-independent 
systems with one degree of freedom have a two-dimensional phase 
space. Energy is conserved, so all orbits are level curves of the 
Hamiltonian. The possible orbit types are restricted to curves 
that are contours of a real-valued function. The possible orbits 
are paths of constant altitude in the mountain range on the phase 
plane described by the Hamiltonian. 

There are a small number of characteristic features that are pos- 
sible. There are points that are stable equilibria of the dynamical 
system. These are the peaks and pits of the Hamiltonian mountain 
range. These equilibria are stable in the sense that neighboring 
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trajectories on nearby contours stay close to the equilibrium point. 
There are orbits that trace simple closed curves on contours that 
surround a peak or pit, or perhaps several peaks. There are also 
trajectories that lie on contours that cross at a saddle point. The 
crossing point is an unstable equilibrium. It is unstable in the 
sense that neighboring trajectories leave the vicinity of the equi- 
librium point. Such contours that cross at saddle points are called 
separatrices, a contour that “separates” two regions of distinct be- 
havior.!” 

At every point Hamilton’s equations give a unique rate of evo- 
lution. Hamilton’s equations direct the system to move perpen- 
dicular to the gradient of the Hamiltonian. At the peaks, pits, 
and saddle points, the gradient of the Hamiltonian is zero, so ac- 
cording to Hamilton’s equations these are fixed points. At other 
points, the gradient of the Hamiltonian is non-zero, so according 
to Hamilton’s equations the rate of evolution is non-zero. Trajec- 
tories evolve along the contours of the Hamiltonian. Trajectories 
on simple closed contours periodically trace the contour. At a 
saddle point contours cross. The gradient of the Hamiltonian is 
zero at the saddle point so a system started at the saddle point 
does not leave the saddle point. On the separatrix away from the 
saddle point the gradient of the Hamiltonian is not zero so trajec- 
tories evolve along the contour. Trajectories on the separatrix are 
asymptotic forward or backward in time to a saddle point. Going 
forward or backward in time such trajectories forever approach 
an unstable equilibrium but never reach it. If the phase space is 
bounded, asymptotic trajectories that lie on contours of a smooth 
Hamiltonian are always asymptotic to unstable equilibria at both 
ends (but they may be different equilibria). 

These orbit types are all illustrated by the prototypical phase 
plane of the pendulum (see figure 3.4). The solutions lie on con- 
tours of the Hamiltonian. There are three regions of the phase 
plane; in each the motion is qualitatively different. In the cen- 
tral region the pendulum oscillates; above this there is a region 
in which the pendulum circulates in one direction; and below the 
oscillation region the pendulum circulates in the other direction. 
In the center of the oscillation region there is a stable equilibrium, 
at which the pendulum is hanging motionless. At the boundaries 


17 Separatrices is the plural of separatric. 
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Figure 3.4 The phase plane of the pendulum has three regions dis- 
playing two distinct kinds of behavior. In this figure there are a number 
of different trajectories. Trajectories lie on the contours of the Hamilto- 
nian. Trajectories may oscillate, making ovoid curves around the equi- 
librium point, or they may circulate, producing wavy tracks outside the 
eye-shaped region. The eye-shaped region is delimited by the separatrix. 
This pendulum has length 1m, a bob of mass 1kg, and the acceleration 


of gravity is 9.8ms~?. 


between these regions the pendulum is asymptotic to the unstable 
equilibrium, at which the pendulum is standing upright.'* There 
are two asymptotic trajectories, corresponding to the two ways the 
equilibrium can be approached. Each of these is also asymptotic 
to the unstable fixed point going backward in time. 


3.4 Phase Space Reduction 
Our motivation for the development of Hamilton’s equations was 


to focus attention on the quantities that are sometimes conserved— 
the momenta and the energy. In the Hamiltonian formulation the 


18The pendulum has only one unstable equilibrium. Remember that the co- 
ordinate is an angle. 
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generalized configuration coordinates and the conjugate momenta 
comprise the state of the system at a given time. We know from 
the Lagrangian formulation that if the Lagrangian does not de- 
pend on some coordinate then the conjugate momentum is con- 
served. This is also true in the Hamiltonian formulation, but there 
is a distinct advantage to the Hamiltonian formulation. In the La- 
grangian formulation the knowledge of the conserved momentum 
does not immediately lead to any simplification of the problem, 
but in the Hamiltonian formulation the fact that momenta are 
conserved gives an immediate reduction of the dimension of the 
system to be solved. In fact, if a coordinate does not appear 
in the Hamiltonian then the dimension of the system of coupled 
equations that are remaining to be solved is reduced by two— 
the coordinate does not appear and the conjugate momentum is 
constant. 

Let H(t,q,p) be a Hamiltonian for some problem with an n- 
dimensional configuration space and 2n-dimensional phase space. 
Suppose the Hamiltonian does not depend upon the ith coordinate 
g: (O,H); = 0.19 According to Hamilton’s equations the conju- 
gate momentum p; is conserved. Hamilton’s equations of motion 
for the remaining 2n — 2 phase space variables do not involve q 
(because it does not appear in the Hamiltonian), and p; is a con- 
stant. Thus the dimension of the difficult part of the problem, 
the part that involves the solution of coupled ordinary differential 
equations, is reduced by two. The remaining equation governing 
the evolution of qf in general depends on all the other variables, 
but once the reduced problem has been solved, then the equation 
of motion for qf can be written so as to give Dq’ explicitly as a 
function of time. We can then find g as a definite integral of this 
function.?° 

Contrast this result with analogous results for more general 
systems of differential equations. There are two independent sit- 
uations. 


19Tf a Lagrangian does not depend on a particular coordinate then neither does 
the corresponding Hamiltonian, because the coordinate is a passive variable 
in the Legendre transform. Such a Hamiltonian is said to be cyclic in that 
coordinate. 


20Tyaditionally, when a problem has been reduced to the evaluation of a def- 
inite integral it is said to be reduced to a “quadrature.” Thus, the determi- 
nation of the evolution of a cyclic coordinate q is reduced to a problem of 
quadrature. 
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One situation is that we know a constant of the motion. In gen- 
eral, constants of the motion can be used to reduce the dimension 
of the unsolved part of the problem by one. To see this, let the 
system of equations be 


Dz = F'(z', 27, 2.52") (3.92) 


where m is the dimension of the system. Assume we know some 
constant of the motion 


C2 ace Se. (3.93) 


At least locally, we expect that we can use this equation to solve 
for z™ in terms of all the other variables, and use this solution to 
eliminate the dependence on z™. The first m — 1 equations then 
only depend upon the first m — 1 variables. The dimension of 
the system of equations to be solved is reduced by one. After the 
solution for the other variables has been found, z™ can be found 
using the constant of the motion. 

Another situation is that one of the variables, say z’, does not 
appear in the equations of motion (but there is an equation for 
Dz‘). In this case the equations for the other variables form an 
independent set of equations of one dimension less than the orig- 
inal system. After these are solved, then the remaining equation 
for zf can be solved by definite integration. 

In both situations the dimension of the system of coupled equa- 
tions is reduced by one. What is different about Hamilton’s equa- 
tions is that these two situations often come together. If a Hamil- 
tonian for a system does not depend on a particular coordinate 
then the equations of motion for the other coordinates and mo- 
menta do not depend on that coordinate. Furthermore, the mo- 
mentum conjugate to that coordinate is a constant of the motion. 
An added benefit is that the use of this constant of the motion 
to reduce the dimension of the remaining equations is automatic 
in the Hamiltonian formulation. The conserved momentum is a 
state variable and just a parameter in the remaining equations. 

When a generalized coordinate does not appear in the La- 
grangian there is some continuous symmetry that is being ex- 
pressed. The results on the reduction of the phase space show us 
that in the formulation of a problem for which some symmetry is 
apparent it will probably be to our advantage if we choose a coor- 
dinate system that explicitly incorporates the symmetry, making 
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the Hamiltonian independent of a coordinate. Then the dimension 
of the phase space of the coupled system will be reduced by two 
for every coordinate that does not appear in the Hamiltonian.?! 


Motion in a central potential 

Consider the motion of a particle of mass m in a central poten- 
tial. A natural choice for generalized coordinates that reflects the 
symmetry is polar coordinates. A Lagrangian is (equation 1.67): 


L(t;r, 937, p) = smi + 727) — V(r). (3.94) 


The momenta are p, = mr and py = mr’. The kinetic energy is 
a homogeneous quadratic form in the velocities so the Hamiltonian 
is T + V with the velocities rewritten in terms of the momenta: 


2 2 
ween oP Po 
H(t; r, P; Pr, Po) = om =e Əmr? + V(r). (3.95) 
Hamilton’s equations are: 
Dr = ® 
m 
P 
Do = —~, 
mr 
pe 
_ ¢ 
Dp, — ie. — DV(r) 
Dp, = 0. (3.96) 


The potential energy depends on the distance from the origin, r, 
as does the kinetic energy in polar coordinates, but neither the 
potential energy nor the kinetic energy depends on the polar an- 
gle y. The angle y does not appear in the Lagrangian so we know 
that py, the momentum conjugate to p, is conserved along real- 
izable trajectories. The fact that p, is constant along realizable 
paths is expressed by one of Hamilton’s equations. That p, has a 
constant value is immediately made use of in the other Hamilton’s 


211¢ is not always possible to choose a set of generalized coordinates in which 
all symmetries are simultaneously manifest. For these systems, the reduction 
of the phase space is more complicated. We have already encountered such 
a problem: the motion of a free rigid body. The system is invariant under 
a rotation about any axis, yet no single coordinate system can reflect this 
symmetry. Nevertheless, we have already found that the dynamics is described 
by a system of lower dimension that the full phase space: the Euler equations. 
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equations: the remaining equations are a self-contained subsystem 
with constant pọ. To make a lower dimensional subsystem in the 
Lagrangian formulation we have to use each conserved momen- 
tum to eliminate one of the other state variables, as we did for the 
axisymmetric top (see section 2.10). 

We can check our derivations with the computer. A procedure 
implementing the Lagrangian has already been introduced (below 
equation 1.67). We can use this to get the Hamiltonian: 


(show-expression 
((Lagrangian->Hamiltonian 
(L-central-polar ’m (literal-function ’V))) 
(up °t (up ’r ’phi) (down ’p_r ’p_phi)))) 


1,2 1,2 
Vje aey aer 
mr m 


and to develop Hamilton’s equations: 


(show-expression 
(((Hamilton-equations 
(Lagrangian->Hamiltonian 

(L-central-polar ’m (literal-function ’V)))) 
(up (literal-function ’r) 

(literal-function ’phi)) 
(down (literal-function ’p_r) 

(literal-function ’p_phi))) 

’t)) 


Dpg (t) 


Axisymmetric top 
We reconsider the axisymmetric top (see section 2.10) from the 
Hamiltonian point of view. Recall that a top is a rotating rigid 
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body, one point of which is fixed in space. The center of mass is not 
at the fixed point, and there is a uniform gravitational field. An 
axisymmetric top is a top with an axis of symmetry. We consider 
here an axisymmetric top with the fixed point on the symmetry 
axis. 

The axisymmetric top has two continuous symmetries that we 
would like to exploit. It has the symmetry that neither the ki- 
netic nor potential energy are sensitive to the orientation of the 
top about the symmetry axis. The kinetic and potential energy 
are also insensitive to a rotation of the physical system about the 
vertical axis, because the gravitational field is uniform. We take 
advantage of these symmetries by choosing coordinates that nat- 
urally express them. We already have an appropriate coordinate 
system that does the job—the Euler angles. We choose the refer- 
ence orientation of the top so that the symmetry axis is vertical. 
The first Euler angle yY expresses a rotation about the symmetry 
axis. The next Euler angle 0 is the tilt of the symmetry axis of 
the top from the vertical. The third Euler angle y expresses a 
rotation of the top about the fixed z axis. The symmetries of the 
problem imply that the first and third Euler angles do not appear 
in the Hamiltonian. As a consequence the momenta conjugate to 
these angles are conserved quantities. The problem of determining 
the motion of the axisymmetric top is reduced to the problem of 
determining the evolution of 0 and pg. Let’s work out the details. 

In terms of Euler angles a Lagrangian for the axisymmetric top 
is (see section 2.10): 


(define ((L-axisymmetric-top A C gMR) local) 
(let ((q (coordinate local)) 
(qdot (velocity local))) 
(let ((theta (ref q 0)) 
(thetadot (ref qdot 0)) 
(phidot (ref qdot 1)) 
(psidot (ref qdot 2))) 
(+ (* 1/2 A 
(+ (square thetadot) 
(square (* phidot (sin theta))))) 
(* 1/2 C 
(square (+ psidot (* phidot (cos theta))))) 
(* -1 gMR (cos theta)))))) 


where gMR is the product of the gravitational acceleration, the 
mass of the top, and the distance from the point of support to the 
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center of mass. The Hamiltonian is nicer than we have a right to 
expect: 


(show-expression 
((Lagrangian->Hamiltonian (L-axisymmetric-top ’A ’C ’gMR)) 
(up °t 
(up ’theta ’phi ’psi) 
(down ’p_theta ’p_phi ’p_psi)))) 


ZPY y 323 (cos (0)? ip? PoPy cos (0) , 33 
C — A(sin(®)? A  Al(sin(0)?  A(sin(0)? 
+gMR - cos (0) 


Note that the angles y and w do not appear in the Hamiltonian, 
as expected. Thus the momenta p, and py are constants of the 
motion. 

For given values of p, and py we must determine the evolu- 
tion of 0 and pọ. The Hamiltonian for 0 and pọ is effectively a 
one degree of freedom Hamiltonian, and this Hamiltonian does 
not involve the time. Thus the value of the Hamiltonian is con- 
served along realizable trajectories. This means that the possible 
trajectories of 0 and pọ can be represented as contours of the 
Hamiltonian. This gives us a big picture of the possible types of 
motion and their relationship, for given values of pp and py. 

If the top is standing vertically then py = py. Let’s concen- 
trate on the case that pp = py, and define p = py = py. The 
Hamiltonian becomes (after a little trigonometric simplification) 
2 2 
xa a tan? S + gM Roos 8. (3.97) 
Defining the effective potential energy 


2 2 
Usa (0) = = + oA tan? ° + gM Roos 8, (3.98) 


which parametrically depends on p, A, C, and gM R, the Hamil- 
tonian is 


H = +2 +U.9(6). (3.99) 
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Figure 3.5 The effective potential energy of the axisymmetric top as 
a function of the angle. The top curve is for an axial angular momentum 
p > pe. For this value the top is stable standing vertically. The bottom 
curve is for p < pe. Here, the top is not stable standing vertically. 
The middle curve is for p at the critical angular momentum. We see 
the bifurcation of the stable equilibrium of the sleeping top into three 
equilibrium points, one of them unstable. 


If p is large Ugg has a single minimum at 0 = 0, as we can see in 
figure 3.5 For small p there is a minimum for finite positive 0 and 
a symmetrical minimum for negative 6; there is a local maximum 
at 0 = 0. There is a critical value of p at which 0 = 0 changes from 
being a minimum to a local maximum. Denote the critical value 
by pe. A simple calculation shows pe = /4gMRA. For 0 = 0 
we have p = Cw where w is the rotation rate. Thus to pe there 
corresponds a critical rotation rate 


we = V4gMRA/C. (3.100) 


For w > we the top can stand vertically; for w < we the top 
falls if slightly displaced from the vertical. The top which stands 
vertically is called the “sleeping” top. For a more realistic top 
friction gradually slows the rotation, and the rotation rate of the 
top eventually falls below the critical rotation rate and the top 
“wakes up.” 
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Figure 3.6 The 0, pg phase plane for the axisymmetric top with 
Po = py and w = 130 rad/s. The parameters are A = 0.0000328kg m’, 
C = 0.000066kg m°, gM R = 0.0456kg m*s~2. For these parameters the 
critical frequency w, is about 117.2 rad/s. 


0.005 


-0.005 
—T 0 T 


Figure 3.7 The 0, pọ phase plane for the axisymmetric top with 
Po = Py and w = 90 rad/sec. The other parameters are as before. 


3.4 Phase Space Reduction 217 


0.01 


Figure 3.8 The 0, pg phase plane for the axisymmetric top with p, > 
pw. Most of the parameters are the same as before, but here py = 


0.00726kgm7s~! and py = 0.00594kgm7s~!. 


We get additional insight into the sleeping top and the awake 
top by looking at the trajectories in the 0, pg phase plane. The 
trajectories in this plane are simply contours of the Hamiltonian, 
because the Hamiltonian is conserved. Figure 3.6 shows a phase 
portrait for w > we. All of the trajectories are loops around the 
vertical (9 = 0). Displacing the top slightly from the vertical 
simply places the top on a nearby loop, so the top stays nearly 
vertical. Figure 3.7 shows the phase portrait for w < we. Here 
the vertical position is an unstable equilibrium. The trajectories 
that approach the vertical are asymptotic—they take an infinite 
amount of time to reach it, just as a pendulum with just the right 
initial conditions can approach the vertical but never reach it. If 
the top is displaced slightly from the vertical then the trajectories 
loop around another center with nonzero 0. A top started at the 
center point of the loop stays there, and one started near this 
equilibrium point loops stably around it. Thus we see that when 
the top “wakes up” the vertical is unstable, but the top does not 
fall to the ground. Rather, it oscillates around a new equilibrium. 

It is also interesting to consider the axisymmetric top when 
Po # Py. Consider the case py > py. Some trajectories in the 0, 
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po plane are shown in figure 3.8. Note that in this case trajectories 
do not go through 6 = 0. The phase portrait for py < py is similar 
and will not be shown. 

We have reduced the motion of the axisymmetric top to quadra- 
tures by choosing coordinates that express the symmetries. It 
turns out that the resulting integrals can be expressed in terms of 
elliptic functions. Thus, the axisymmetric top can be analytically 
solved. We do not dwell on this solution because it is not very il- 
luminating. In fact, most problems cannot be solved analytically, 
so there is not much profit in dwelling on the analytic solution of 
one of the rare problems which is analytically solvable. Rather, 
our discussion has focused on the geometry of the solutions in the 
phase space, and the use of integrals to reduce the dimension of 
the problem. With the phase space portrait we have found some 
interesting qualitative features of the motion of the top. 


Exercise 3.8: Sleeping top 


Verify that the critical angular velocity above which an axisymmetric 
top can sleep is given by equation (3.100). 


3.4.1 Lagrangian Reduction 


Suppose there are cyclic coordinates. In the Hamiltonian formula- 
tion the equations of motion for the coordinates and momenta for 
the other degrees of freedom form a self contained subsystem, in 
which the momenta conjugate to the cyclic coordinates are param- 
eters. We can form a Lagrangian for this subsystem by performing 
a Legendre transform of the reduced Hamiltonian. Alternatively, 
we can start with the full Lagrangian and perform a Legendre 
transform only for those coordinates that are cyclic. The equa- 
tions of motion are Hamilton’s equations for those variables that 
are transformed and Lagrange’s equations for the others. The 
momenta conjugate to the cyclic coordinates are conserved and 
can be treated as parameters in the Lagrangian for the remaining 
coordinates. 

Divide the tuple q of coordinates into two subtuples q = (x,y). 
Assume L(t; £, Y; Uz, Vy) is a Lagrangian for the system. Define 
the Routhian R as the Legendre transform of L with respect to 
the vy slot: 


Py = O21 L(t; £, Y; Ve, Vy) (3.101) 
Pyy = R(t; £, Y; Ve, Py) + L(t; £, Y; Ve, Vy) (3.102) 
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Vy = 02,1 R(t; £, Y; Ve, Py) ( ) 
0 = R(t; £, Y; Ve, Py) + OLE; £, Y; Vr, Vy) ( ) 
0 = A R(t; £, Y; Ve, Py) + AL (t; £, Y; Va, Vy) (3.105) 
0 = O20R(t; £, Y; Vx, Py) + O2,0L(t; £, Y; Ve; Vy) ( ) 


To define the function R we must solve equation (3.101) for vy 
in terms of the other variables, and substitute this into equa- 
tion (3.102). 

Define the state path = 


E(t) = (t; x(t), y(t); Dx(t), py(t)), (3.107) 
where 
Py(t) = 021 L(t; x(t), y(t); Da(t), Dy(t)). (3.108) 


Realizable paths satisfy the equations of motion 


D(02,0R 0 =)(t) = O10R o E(t) (3.109) 
Dy(t) = 02,1R 0 Z(t) (3.110) 
Dp,(t) = -911R 0 S(t), (3.111) 


which are Lagrange’s equations for x and Hamilton’s equations 
for y and py. 

Now suppose that the Lagrangian is cyclic in y. Then 0; L = 
ıı R = 0, and p,(t) is a constant c on any realizable path. Equa- 
tion (3.109) does not depend on y, by assumption, and we can 
replace py by its constant value c. So equation (3.109) forms a 
closed subsystem for the path x. The Lagrangian Le 


L(t, £, ve) = —R(t; £, @; vz, c). (3.112) 


describes the motion of the subsystem. The minus sign is intro- 

duced for convenience. The path y can be found by integrating 

equation (3.110) using the independently determined path x. 
Define the action 


t2 


sietst) = f Le oT[z]. (3.113) 


tı 
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The realizable paths x satisfy the Lagrange equations with the 
Lagrangian Le, so the action S! is stationary with respect to vari- 
ations € of x that are zero at the end times: 


6¢S2(ti, te) = 0. (3.114) 


For realizable paths q the action S|q|(t1, t2) is stationary with 
respect to variations 7 of q that are zero at the end times. Along 
these paths the momentum p,(t) has the constant value c. For 
these same paths the action S![2](t1, t2) is stationary with respect 
to variations € of x that are zero at the end times. The dimension 
of £ is smaller than the dimension of 7. 

The values of the actions S'.|x] (t1, t2) and S[q](t1, t2) are related: 


Slal(t1, t2) = Selz] — | i CUy 


th 


= Sele] — e(y(t2) — y(t1)). (3.115) 


Exercise 3.9: Routhian equations of motion 


Verify that the equations of motion are given by equations (3.109) to 
(3.111). 


3.5 Phase Space Evolution 


Most problems do not have enough symmetries to be reducible 
to quadrature. It is natural to turn to numerical integration to 
learn more about the evolution of such systems. The evolution in 
phase space may be found by numerical integration of Hamilton’s 
equations. 

Hamilton’s equations are already in first order form; the Hamil- 
tonian state derivative is the same as the phase-space derivative: 


(define Hamiltonian->state-derivative 
phase-space-derivative) 


As an illustration consider again the periodically driven pendu- 
lum (see section 1.6.2). The Hamiltonian is 


(show-expression 
((Lagrangian->Hamiltonian 
(L-periodically-driven-pendulum ’m ’1 ’g ’a ’omega)) 
(up ’t ’theta ’p_theta))) 
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Figure 3.9 This is a phase-space picture of the evolution of the driven 
pendulum. The phase-space view of the evolution reveals some interest- 
ing structure. 


1 
— 5a mu (cos (8))? (sin (wt))? + agm cos (wt) 
; ; 1,2 
„Posin (8) sin (wt) dinieoaOy a: ZPO 
l Pr 


Hamilton’s equations for the periodically driven pendulum are un- 
revealing, so we will not show them. We build a system derivative 
from the Hamiltonian: 


(define (H-pend-sysder m 1 g a omega) 
(Hamiltonian->state-derivative 
(Lagrangian->Hamiltonian 
(L-periodically-driven-pendulum m 1 g a omega)))) 


Now we integrate this system, with the same initial conditions as 
in chapter 1 (see figure 1.7), but displaying the trajectory in phase 
space (figure 3.9). We make a monitor procedure to display the 
evolution in phase space: 
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(define ((monitor-p-theta win) state) 
(let ((q ((principal-value pi) (coordinate state))) 
(p (momentum state) )) 
(plot-point win q p))) 


We use evolve to explore the evolution of the system 


(define window (frame -pi pi -10.0 10.0)) 


(let ((@ 1.) ;m=1kg 
(1 1.) ;l=1m 
(g 9.8) ;g=9.8m/s” 
(A 0.1) ;A=1/10 m 


(omega (* 2 (sqrt 9.8)))) 
(Cevolve H-pend-sysder m 1 g A omega) 


(up 0.0 3 to=0 
1.0 ;thetag=1 radian 
0.0) ;thetadoto=0 radians/s 
(monitor-p-theta window) 
0.01 ;plot interval 
100.0 ;final time 
1.0e-12)) 


The trajectory sometimes oscillates and sometimes circulates. The 
patterns in the phase plane are reminiscent of the trajectories in 
the phase plane of the undriven pendulum shown in figure 3.4. 


3.5.1 Phase Space Description is Not Unique 


We are familiar with the fact that a given motion of a system is 
expressed differently in different coordinate systems: the functions 
that express a motion in rectangular coordinates are different from 
the functions that express the same motion in polar coordinates. 
However, with a given coordinate system the evolution of the local 
state tuple for particular initial conditions is unique. The general- 
ized velocity path function is the derivative of the generalized co- 
ordinate path function. On the other hand, the coordinate system 
alone does not uniquely specify the phase-space description. The 
relationship of the momentum to the coordinates and the veloci- 
ties depends on the Lagrangian, and many different Lagrangians 
may be used to describe the behavior of the same physical system. 
When two Lagrangians for the same physical system are different 
the phase-space descriptions of a dynamical state are different. 
We have already seen two different Lagrangians for the driven 
pendulum (see section 1.6.2). One was found using L = T—V, and 
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Figure 3.10 An orbit of the driven pendulum in the phase space 
using L = T — V is shown in the upper plot. In the lower plot the same 
trajectory is shown in the phase space for the alternate Lagrangian. The 
evolution is the same, but the phase space representations are not the 
same. 
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the other was found by inspection of the equations of motion. The 
two Lagrangians differ by a total time derivative. The momentum 
pg conjugate to 0 depends on which Lagrangian we choose to work 
with, and the description of the evolution in the corresponding 
phase space also depends on the choice of Lagrangian, even though 
the behavior of the system is independent of the method used to 
describe it. The momentum conjugate to 0, using the D = T — V 
Lagrangian, is 


po = ml?6 — almw sin 6 sin wt (3.116) 


and the momentum conjugate to 0, using the alternate Lagrangian, 
is 


po = ml76. (3.117) 


The two momenta differ by an additive distortion that varies peri- 
odically in time and depends on 9. That the phase-space descrip- 
tions are different is illustrated in figure 3.10. The evolution of 
the system is the same for each. 


3.6 Surfaces of Section 


Computing the evolution of mechanical systems is just the begin- 
ning of understanding the dynamics. Typically, we want to know 
much more than the phase space evolution of some particular tra- 
jectory. We want to obtain a qualitative understanding of the 
motion. We want to know what sorts of motion are possible, and 
how one type relates to others. We want to abstract the essential 
dynamics from the myriad particular evolutions that we can cal- 
culate. One tool that we can bring to bear on this problem is a 
technique called the surface of section or Poincaré section.?? 
Paradoxically, it turns out that by throwing away most of the 
calculated information about a trajectory we gain essential new 
information about the character of the trajectory and its relation 


22-The surface of section technique was introduced by Poincaré in his Méthodes 
Nouvelles de la Mécanique Céleste. Poincaré proved remarkable results about 
dynamical systems using the surface of section technique, and we shall return 
to those later. The surface of section technique is a key tool in the modern 
study of dynamical systems, for both analytical and numerical investigations. 
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to other trajectories. A surface of section is generated by looking 
at successive intersections of a trajectory or a set of trajectories 
with a plane in the phase space. Typically, the plane is spanned 
by a coordinate axis and the canonically conjugate momentum 
axis. We will see that surfaces of section made in this way have 
nice properties. The collection of points generated on these sur- 
faces of section reveal important qualitative information about 
the nature of the trajectories and the relationship among various 
types of trajectories.2? The surface of section reveals two qualita- 
tively different types of motion: regular and chaotic. An essential 
characteristic of the chaotic motions is that initially nearby trajec- 
tories separate exponentially with time; the separation of regular 
trajectories is linear.24 These two types of trajectories are found 
to be clustered in regions of regular motion and regions of chaotic 
motion. 


3.6.1 Poincaré Sections for Periodically-Driven Systems 


For a periodically driven system the surface of section is a strobo- 
scopic view of the evolution; we consider only the state of the sys- 
tem at the strobe times, with the period of the strobe equal to the 
drive period. We generate a surface of section for a periodically- 
driven system by computing a number of trajectories and accumu- 
lating the phase-space coordinates of each trajectory whenever the 
drive passes through some particular phase. Let T be the period 
of the drive, then, for each trajectory, the surface of section ac- 
cumulates the phase-space points (q(t), p(t)), (q(t + T), p(t + T)), 
(q(t + 2T), p(t + 2T)), and so on (see figure 3.11). For a system 


23The surface of section technique was put to spectacular use in the 1964 
landmark paper [19] by astronomers Michel Hénon and Carl Heiles. In their 
numerical investigations they found that some trajectories are chaotic, and 
show exponential divergence with time, while other trajectories are regular, 
showing linear divergence with time. They found that these two types of 
trajectories are typically clustered in the phase space into regions of chaotic 
behavior and regions of regular behavior. 


74That solutions of ordinary differential equations can show exponential 
sensitivity to initial conditions was independently discovered by Edward 
Lorenz ([28]) in the context of simplified model of convection in the Earth’s 
atmosphere. Lorenz coined the picturesque term the “butterfly effect” to de- 
scribe this sensitivity. The weather system model of Lorenz is so sensitive to 
initial conditions that “the flapping of a butterfly’s wings in Brazil can change 
the course of a typhoon in Japan.” 
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Figure 3.11 Stroboscopic surface of section for a periodically driven 
system. For each trajectory the surface of section accumulates the set 
of phase-space points after each full cycle of the drive. 


with a single degree of freedom we can plot the sequence of phase- 
space points on a q, p surface. 

In the case of the stroboscopic section for the periodically driven 
system the phase of the drive is the same for all section points, 
thus each phase-space point in the section, with the known phase 
of the drive, may be considered as an initial condition for the 
rest of the trajectory. The absolute time of the particular section 
point does not affect the subsequent evolution; all that matters is 
that the phase of the drive have the value specified for the section. 
Thus we can think of the dynamical evolution as generating a map 
that takes a point in the phase space and generates a new point 
on the phase space after evolving the system for one drive period. 
This map of the phase space onto itself is called the Poincaré map. 

Figure 3.12 shows an example Poincaré section for the driven 
pendulum. We plot the section points for a number of different 
initial conditions. We are immediately presented with a new facet 
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Figure 3.12 Surface of section for the driven pendulum. The angle is 
plotted on the abscissa; the momentum conjugate to this angle is plotted 
on the ordinate. For this section the parameters are: m = 1 kg, l = 1m, 
g = 9.8 m/s”, A= 0.05 m, w = 4.2w, with wo = y/g/l. 


of dynamical systems. For some initial conditions, the subsequent 
section points appear to fill out a set of curves in the section. For 
other initial conditions this is not the case. Rather, the set of 
section points appear scattered over a region of the section. In 
fact, all of the scattered points in figure 3.12 were generated from 
a single initial condition. The surface of section suggests that 
there are qualitatively different classes of trajectories that differ 
in the dimension of the subspace of the section that they explore. 

Trajectories that fill out curves on the surface of section are 
called regular trajectories.” The curves that are filled out by the 
regular trajectories are invariant curves. They are invariant in 
that if any section point for a trajectory falls on an invariant curve 
all subsequent points fall on the same invariant curve. Otherwise 
stated, the Poincaré map maps every point on an invariant curve 
onto the invariant curve. 


Regular trajectories are also called quasiperiodic trajectories. 
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The trajectories that appear to fill areas are called chaotic tra- 
jectories. For these points the distance in phase space between ini- 
tially nearby points grows, on average, exponentially with time.?° 
In contrast, for the regular trajectories, the distance in phase space 
between initially nearby points grows, on average, linearly with 
time. 

The phase space seems to be grossly clumped into different re- 
gions. Initial conditions in some regions seem to predominately 
yield regular trajectories, and other regions seem to predominately 
yield chaotic trajectories. This gross division of the phase space 
into qualitatively different types of trajectories is called the di- 
vided phase space. We will see later that there is much more 
structure here than is apparent at this scale, and that upon mag- 
nification there is a complicated interweaving of chaotic and reg- 
ular regions on finer and finer scales. Indeed, we shall see that 
many trajectories which appear to generate curves on the surface 
of section are, upon magnification, actually chaotic and fill a tiny 
area. We shall also find that there are trajectories which lie on 
one-dimensional curves on the surface of section, but which only 
explore a subset of this curve formed by cutting out an infinite 
number of holes.?” 

The features seen on the surface of section of the driven pen- 
dulum are quite general. The same phenomena are seen in most 
dynamical systems. In general, there are both regular and chaotic 
trajectories, and there is the clumping characteristic of the divided 
phase space. The specific details depend upon the system, but the 
basic phenomena are generic. Of course we are interested in both 
aspects: the phenomena which are generic to all systems, and the 
specific details for particular systems of interest. 

The surface of section for the periodically driven pendulum has 
specific features that give us qualitative information about how 
this system behaves. The central island in figure 3.12 is the rem- 
nant of the oscillation region for the unforced pendulum (see fig- 
ure 3.4). There is a sizable region of regular trajectories here that 
are, in a sense, similar to the trajectories of the unforced pendu- 


?6We saw an example of this extreme sensitivity to initial conditions in fig- 
ure 1.7. 


27 One-dimensional invariant sets with an infinite number of holes are some- 
times called cantori, by analogy to the Cantor sets, but it really doesn’t 
Mather. 
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lum. In this region, the pendulum oscillates back and forth, much 
as the undriven pendulum does, but the drive makes it wiggle as 
it does so. The section points are all collected at the same phase 
of the drive so we do not see these wiggles on the section. 

The central island is surrounded by a large chaotic zone. Thus 
the region of phase space with regular trajectories similar to the 
unforced trajectories has finite extent. On the section, the bound- 
ary of this “stable” region is apparently rather well defined—there 
is a sudden transition from smooth regular invariant curves to 
chaotic motion that can take the system far from this region of 
regular motion. 

There are two other sizeable regions of regular behavior. The 
trajectories in these regions are resonant with the drive, on av- 
erage executing one full rotation per cycle of the drive. The two 
islands differ in the direction of the rotation. In these regions 
the pendulum is making complete rotations, but the rotation is 
locked to the drive so that points on the section appear only in the 
islands with finite angular extent. The fact that points for partic- 
ular trajectories loop around the islands means that the pendulum 
sometimes completes a cycle faster than the drive and sometimes 
slower than the drive, but never loses lock. 

Each regular region has finite extent. So from the surface of 
section we can see directly the range of initial conditions which 
remain in resonance with the drive. Outside of the regular region 
initial conditions lead to chaotic trajectories which evolve far from 
the resonant regions. 

Various higher order resonance islands are also visible, as are 
non-resonant regular circulating orbits. So, the surface of section 
has provided us with an overview of the main types of motion that 
are possible and their relationship. 

If we change the parameters we can see other interesting phe- 
nomena. Figure 3.13 shows the surface of section when the drive 
frequency is twice the natural small amplitude oscillation fre- 
quency of the undriven pendulum. The section has a large chaotic 
zone, with an interesting set of islands. The central equilibrium 
has undergone an instability and instead of a central island we 
find two off-center islands. These islands are alternately visited 
one after the other. As the support goes up and down the pendu- 
lum alternately tips to one side and then the other. It takes two 
periods of the drive before the pendulum visits the same island. 
Thus, the system has “period doubled.” An island has been re- 
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Figure 3.13 Another surface of section for the driven pendulum, il- 
lustrating a period-doubled central island. For this section the frequency 
of the drive is resonant with the frequency of small amplitude oscilla- 
tions of the undriven pendulum. The angle is plotted on the abscissa 
(scale —z to 7); the momentum conjugate to this angle is plotted on the 
ordinate (scale -10 to 10 kg m?/s). For this section the parameters are: 


m=1l1kg,l=1m,g= 9.8m/s”, A=0.1m, w = 2wo. 


placed by a period-doubled pair of islands. Note that other islands 
still exist. The islands in the top and bottom of the chaotic zone 
are the resonant islands, in which the pendulum loops on average 
a full turn for every cycle of the drive. Note that, as before, if the 
pendulum is rapidly circulating, the motion is regular. 

It is a surprising fact that if we shake the support of a pendu- 
lum fast enough then the pendulum can stand upright. This phe- 
nomenon can be visualized with the surface of section. Figure 3.14 
shows a surface of section when the drive frequency is large com- 
pared to the natural frequency. The pendulum can stand upright 
because there is a regular island at the inverted equilibrium. The 
surface of section shows that the pendulum can remain upright 
for a range of initial displacements from the vertical, which can 
be seen on the surface of section. 
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Figure 3.14 Surface of section for a rapidly driven pendulum, illus- 
trating a vertical equilibrium. The angle is plotted on the abscissa (scale 
—n to T); the momentum conjugate to this angle is plotted on the or- 
dinate (scale -20 to 20 kg m?/s). For this section the parameters are: 


m=1kg,l=1m,g=9.8 m/s", A=0.2 m, w = 10.19. 


3.6.2 Computing Stroboscopic Surfaces of Section 


We already have the system derivative for the pendulum, and we 
can use it to construct a parametric map for constructing Poincaré 
sections. 


(define (driven-pendulum-map m 1 g A omega) 
(let ((advance (state-advancer H-pend-sysder m 1 g A omega)) 
(map-period (/ 2pi omega) )) 
(lambda (theta ptheta return fail) 
(let ((ms (advance 
(up 0 theta ptheta) ; initial state 
map-period) )) ; integration interval 
(return ((principal-value pi) (coordinate ns)) 
(momentum ns)))))) 


A map procedure takes the two section coordinates (here theta 
and ptheta) and two “continuation” procedures. If the section 
coordinates given are in the domain of the map, it produces two 
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new section coordinates and passes them to the return contin- 
uation, otherwise the map procedure calls the fail continuation 
procedure with no arguments.?° 

The trajectories of a map can be explored with an “interactive” 
interface. The procedure explore-map allows us to use a pointing 
device to choose initial conditions for trajectories. For example, 
the surface of section in figure 3.12 was generated by plotting a 
number of trajectories, using a pointer to choose initial conditions, 
with the following program: 


(define win (frame -pi pi -20 20)) 


(let ((m 1.0) ;m=1kg 
(1 1.0) -1=1m 
(g 9.8) ;g=9.8m/s? 
(A 0.05)) ;A=1/20m 


(let ((omega0 (sqrt (/ g 1)))) 
(let ((Comega (* 4.2 omega0))) 
(explore-map 


win 
(driven-pendulum-map m 1 g A omega) 
1000)))) ;1000 points for each ic 


Exercise 3.10: Fun with phase portraits 


Choose some one-degree-of-freedom dynamical system that you are cu- 
rious about and that can be driven with a periodic drive. Construct a 
map of the sort we made for the driven pendulum and do some explor- 
ing. Are there chaotic regions? Are all of the chaotic regions connected 
together? 


3.6.3 Poincaré Sections for Autonomous Systems 


We illustrated the use of Poincaré sections to visualize qualitative 
features of the phase space for a one degree-of-freedom system 
with periodic drive, but the idea is more general. Here we show 
how Hénon and Heiles used the surface of section to elucidate the 
properties of an autonomous system. 


Hénon-Heiles background 
In the early 60’s astronomers were up against a wall. Careful mea- 
surements of the motion of nearby stars in the galaxy had allowed 


8Tn the particular case of the driven pendulum there is no reason to call fail. 
This contingency is reserved for systems where orbits escape or cease to satisfy 
some constraint. 
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particular statistical averages of the observed motions to be de- 
termined, and the averages were not at all what was expected. In 
particular, what was calculated was the velocity dispersion: the 
root mean square deviation of the velocity from the average. We 
use angle brackets to denote an average over nearby stars: (w) is 
the average value of some quantity w for the stars in the ensem- 
ble. The average velocity is (#). The components of the velocity 
dispersion are 


Ox = ((@ — (4))?)? (3.118) 
oy = (9 — a (3.119) 
o; = ((2— (a). (3.120) 


If we use cylindrical polar coordinates (r,@,z) and align the axes 
with the galaxy so that z is perpendicular to the galactic plane 
and r increases with the distance to the center of the galaxy, then 
two particular components of the velocity dispersion are: 


oz = (2 — (3)??? (3.121) 
or = l(i — (A. (3.122) 


It was the expectation at the time that these two components of 
the velocity dispersion should be equal. In fact they were found 
to differ by about a factor of 2: op ~ 20, What was the prob- 
lem? In the literature at the time there was considerable discus- 
sion of what could be wrong. Was the problem some observa- 
tional selection effect? Were the velocities measured incorrectly? 
Were the assumptions used in the derivation of the expected ratio 
not adequately satisfied? For example, the derivation assumed 
that the galaxy was approximately axisymmetric. Perhaps non- 
axisymmetric components of the galactic potential were at fault. 
It turned out that the problem was much deeper. The under- 
standing of motion was wrong. 

Let’s review the derivation of the expected relation among the 
components of the velocity dispersion. We wish to give a sta- 
tistical description of the distribution of stars in the galaxy. We 
introduce the phase-space distribution function f(g, p) which gives 
the probability density of finding a star at position ¥ with momen- 
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tum 7.7? Integrating this density over some finite volume of phase 
space gives the probability of finding a star in that phase-space 
volume (in that region of space within a specified region of mo- 
menta). We assume the probability density is normalized so that 
the integral over all of phase space gives unit probability; the star 
is somewhere and has some momentum with certainty. In terms 
of f the statistical average of any dynamical quantity w over some 
volume of phase space V is just 


wv = | fu (3.123) 


where the integral extends over the phase-space volume V. In 
computing the velocity dispersion at some point 7, we would com- 
pute the averages by integrating over all momenta. 

Individual stars move in the gravitational potential of the rest 
of the galaxy. It is not unreasonable to assume that the overall 
distribution of stars in the galaxy does not change much with 
time, or changes only very slowly. The density of stars in the 
galaxy is actually very small and close encounters of stars are 
very rare. Thus, we can model the gravitational potential of the 
galaxy as a fixed external potential in which individual stars move. 
The galaxy is approximately axisymmetric. We assume that the 
deviation from exact axisymmetry is not a significant effect and 
thus we take the model potential to be exactly axisymmetric. 

Consider the motion of a point mass (a star) in an axisymmet- 
ric potential (of the galaxy). In cylindrical polar coordinates the 
Hamiltonian is 


| + V(r, z), (3.124) 


where V does not depend on 0. Since 0 does not appear, we know 
that the conjugate momentum pg is constant. For the motion of 


29We will see that it is convenient to look at distribution functions in the phase- 
space coordinates because the consequences of conserved momenta are more 
apparent, but also because volume in phase space is conserved by evolution 
(see section 3.8). 
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any particular star we can treat pg as a parameter. Thus the 
effective Hamiltonian has two degrees of freedom 


— [p? + p2] + U(r, z) (3.125) 

where 

U(r,z) =V(r,z) + Py (3.126) 
: Qmr2 


The value E of the Hamiltonian is constant since there is no ex- 
plicit time dependence in the Hamiltonian. Thus, we have con- 
stants of the motion E and pg. 

Jeans’ “theorem” asserts that the distribution function f de- 
pends only on the values of the integrals of motion. That is, we 
can introduce a different distribution function f’ that represents 
the same physical distribution 


f'(E,po) = f(z, 2). (3.127) 


There was good reason to believe that this might be correct. First, 
it is clear that the distribution function surely depends at least on 
E and pg. The problem is “Given an energy E and angular mo- 
mentum pg what motion is allowed?” The integrals clearly confine 
the evolution. Does the evolution carry the system everywhere 
in the phase space subject to these known constraints? In the 
early part of the 20th century this appeared plausible. Statistical 
mechanics was successful, and statistical mechanics made exactly 
this assumption. Perhaps there are other integrals of the mo- 
tion which exist, but we have not yet discovered them? Poincaré 
proved an important theorem with regard to integrals of the mo- 
tion. Poincaré proved that most integrals of a dynamical system 
typically do not persist upon perturbation of the system. That 
is, if a small perturbation is added to a problem, then most of 
the integrals of the original problem do not have analogs in the 
perturbed problem. The integrals are destroyed. Of course, in- 
tegrals which result from symmetries of the problem continue to 
be preserved if the perturbed system has the same symmetries. 
Thus angular momentum continues to be preserved upon appli- 
cation of any axisymmetric perturbation. Poincaré’s theorem is 
correct, but what came next was not. As a corollary to Poincaré’s 
theorem, in 1920 Fermi published a proof of an ergodic theorem, 
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which stated that typically the motion of perturbed problems is 
ergodic®’ subject to the constraints imposed by the integrals re- 
sulting from symmetries. Loosely speaking, this means that tra- 
jectories go everywhere they are allowed to go by the integral 
constraints. Fermi’s theorem was later shown to be incorrect, but 
on the basis of this theorem we could expect that typically sys- 
tems fully explore the phase space subject only to the constraints 
imposed by the integrals resulting from symmetries. Suppose then 
that the evolution of stars in the galactic potential were subject 
only to the constraints of conserving E and pg. We shall see that 
this is not true, but if it were we could then conclude that the 
distribution function for stars in the galaxy can also only depend 
on E and pg. 

Given this form of the distribution function, we can deduce the 
stated ratios of the velocity dispersions. We note that p, and p, 
appear in the same way in the energy. Thus the average of any 
function of p, computed with the distribution function must equal 
the average of the same function of p,. In particular, the velocity 
dispersions in the z and r directions must be equal: 


Oz = Or. (3.128) 
But this is not what was observed, which was 
Or X 200z. (3.129) 


Hénon and Heiles approached this problem differently than oth- 
ers at the time. Rather than improving the models for the motion 
of stars in the Galaxy, they concentrated on what turned out to be 
the central issue. What is the qualitative nature of motion? The 
problem had nothing to do with galactic dynamics in particular, 
but with the problem of motion. They abstracted the dynamical 
problem from the particulars of galactic dynamics. 


The system of Hénon and Heiles 

We have seen that the study of the motion of a point with mass m 
with an axisymmetric potential energy reduces to the study of a 
reduced two degree of freedom problem in r and z with potential 
energy U(r, z). Hénon and Heiles chose to study the motion in a 


30 A system is ergodic if time averages along trajectories are the same as phase 
space averages over the region explored by the trajectories. 
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two degree of freedom system with a particularly simple poten- 
tial energy so the dynamics would be clear and the calculation 
uncluttered. The Hénon-Heiles Hamiltonian is 


1 
H(t; 2,9; Pes Py) = 5 (p2 + p2) + V(z,y) (3.130) 


with potential energy 
1 1 
V(a,y) = 5 (@ +y’) + ay- ay". (3.131) 


The potential energy is shaped like a distorted bowl. The poten- 
tial energy has triangular symmetry, which is evident when the 
potential energy is rewritten in polar coordinates 
Liss, th he. 
5 gt sin 36. (3.132) 
Contours of the potential energy are shown in figure 3.15. At small 
values of the potential energy the contours are approximately cir- 
cular; as the value of the potential energy approaches 1/6 the 
contours become triangular, and at larger potential energies the 
contours open to infinity. 

The Hamiltonian is time independent, so energy is conserved. 
In this case this is the only known integral. We first determine the 
restrictions that conservation of energy imposes on the evolution. 


We have 
1 
E=5 (pa + py) + V(a,y) 2 V(2,9), (3.133) 


so the motion is confined to the region inside the contour V = E 
because the sum of the squares of the momenta cannot be negative. 

Let’s compute some sample trajectories. For definiteness, we 
investigate trajectories with energy Æ = 1/8. There are a large 
variety of trajectories. There are trajectories that circulate in 
a regular way around the bowl, and there are trajectories that 
oscillate back and forth (figure 3.16). There are also trajectories 
that appear more irregular (figure 3.17). There is no end to the 
trajectories that could be computed, but let’s face it, surely there 
is more to life than looking at trajectories. 

The problem facing Hénon and Heiles was the issue of integrals 
of motion. Are there other integrals besides the obvious ones? 
They investigated this issue with the surface of section technique. 
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Figure 3.15 Contours of the Hénon-Heiles potential energy. The con- 
tours shown, from the inside out, are for potential energies: 1/100, 1/40, 
1/20, 1/12, 1/8, and 1/6. 


The surface of section is generated by looking at successive pas- 
sages of trajectories through a plane in phase space. How does 
this address the issue of the number of integrals? A priori, there 
appear to be two possibilities: either there are hidden integrals 
or there are not. Suppose there is no other integral of the mo- 
tion besides the energy. Then the expectation was that succes- 
sive intersections of the trajectory with the section plane would 
eventually explore all of the section plane that is consistent with 
conservation of energy. On the other hand, if there is a hidden 
integral then the successive intersections would be constrained to 
fall on a curve. 

Specifically, the surface of section is generated by recording and 
plotting py versus y whenever x = 0, as shown in figure 3.18. 
Given the value of the energy E and a point (y, py) on the section 
x = 0 we can recover pr, up to a sign. If we restrict attention to 
intersections with the section plane that cross with, say, positive 
Px, then there is a one to one relation between section points 
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Figure 3.16 Two trajectories of the Hénon-Heiles Hamiltonian pro- 
jected on the (x,y) plane. The energy is E = 1/8. 
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Figure 3.17 Another trajectory of the Hénon-Heiles Hamiltonian pro- 
jected on the (x,y) plane. The energy is E = 1/8. 


and trajectories. A section point thus corresponds to a unique 
trajectory. 
On the section, the energy is 


1 
E = H(t; 0, yY; Pr, Py) = 5 


(pz + py) + V(0,y) (3.134) 


Because p2 is positive, the trajectory is confined by the energy 
integral to regions of the section such that 


E > <p? + V(x =0,y) (3.135) 


So, if there is no other integral, we might expect the points on the 
section to eventually fill the area enclosed by this bounding curve. 

On the other hand, suppose there is a hidden extra integral 
I(x, y; Px, Py) = 0. Then this integral would provide further con- 
straints on the trajectories and their intersections with the sec- 
tion plane. An extra integral I provides a constraint between the 
phase-space variables. We can use E to solve for pz, and on the 
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Figure 3.18 The surface of section for the Hénon-Heiles problem is 
generated by recording and plotting the successive crossings of the x = 0 
plane in the direction of increasing z. 


section x = 0, so the extra integral gives a relation between y and 
py on the section. So we expect that if there is another integral 
the successive intersections of a trajectory with the section plane 
will fall on a curve. 

If there is no extra integral we expect the section points to fill 
an area; if there is an extra integral we expect the section points 
to be restricted to a curve. What actually happens? Figure 3.19 
shows a surface of section for E = 1/12. On the section the 
section points for several representative trajectories are displayed. 
By and large, the points appear to be restricted to curves; so there 
appears to be evidence for an extra integral. Look closely though. 
Where the “curves” cross, the lines are a little fuzzy. Hmmm. 

Let’s try a little larger energy E = 1/8. The appearance of the 
section changes qualitatively (figure 3.20). For some trajectories 
there still appear to be extra constraints on the motion. But other 
trajectories appear to fill an area of the section plane, pretty much 
as we expected of trajectories if there was no extra integral. In 
particular, all of the scattered points on this section were gener- 
ated by a single trajectory. Thus, some trajectories behave as if 
there is an extra integral, and others don’t. Wow! 

Let’s go on to a higher energy E = 1/6, just at the escape 
energy. A section for this energy is shown in figure 3.21. Now, a 
single trajectory explores most of the region of the section plane 
allowed by energy conservation, but not entirely. There are still 
trajectories that appear to be subject to extra constraints. 
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Figure 3.19 Surface of section for the Hénon-Heiles problem with 
energy E = 1/12. 


We seem to have all possible worlds. At low energy, the system 
by and large behaves as if there is an extra integral, but not en- 
tirely. At intermediate energy, the phase space is divided: some 
trajectories explore areas whereas others are constrained. At high 
energy, trajectories explore most of the energy surface; few tra- 
jectories show extra constraints. We have just witnessed our first 
transition to chaos. 

There are two qualitatively different types of motion that are 
revealed by this surface of section, just as we saw in the Poincaré 
sections for the driven pendulum. There are trajectories that seem 
to be constrained as if by an extra integral. And there are trajec- 
tories that explore an area on the section as though there were no 
extra integrals. Regular trajectories appear to be constrained by 
an extra integral to a one-dimensional set on the section; chaotic 
trajectories are not constrained in this way and explore an area.*! 


31 As before, upon close examination we may find that trajectories that appear 
to be confined to a curve on the section are chaotic trajectories that explore 
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Figure 3.20 Surface of section for the Hénon-Heiles problem with 
energy E = 1/8. 


The surface of section not only reveals the existence of qualita- 
tively different types of motion, but it also provides an overview 
of the different types of trajectories. Take the surface of section 
for E = 1/8 (figure 3.20). There are four main islands, engulfed 
in a chaotic sea. The particular trajectories displayed above pro- 
vide examples from different parts of the section. The trajectory 
that loops around the bowl (figure 3.16) belongs to the large is- 
land on the left side of the section. Similar trajectories that loop 
around the bowl in the other direction belong to the large island 
on the right side of the section. The trajectories that oscillate 
back and forth across the bowl belong to the two islands above 
and below the center of the section. (By symmetry there should 


a highly confined region. It is known, however, that some trajectories really 
are confined to curves on the section. Trajectories that start on these curves 
remain on these curves forever, and the trajectories fill these curves densely. 
These invariant curves are preserved by the dynamical evolution. There are 
also invariant subsets of curves with an infinite number of holes. We will 
explore the properties of these sets later. 
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Figure 3.21 Surface of section for the Hénon-Heiles problem with 
energy E = 1/6. The section is clipped on the right. 


be three such islands. The third island is snugly wrapped against 
the boundary of the section.) Each of the main islands is sur- 
rounded by a chain of secondary islands. We will see that the 
types of orbits are inexhaustible, if we look closely enough. The 
chaotic trajectory (figure 3.17) lives in the chaotic sea. Thus the 
section provides a summary of the types of motion possible, and 
how they are related to one another. It is much more useful than 
plots of a zillion trajectories. 

The sections for various energies summarize the dynamics at 
that energy. A sequence of sections for various energies shows how 
the major features change. We have already noticed that at low 
energy the section is dominated by regular orbits, at intermediate 
energy the section is divided more or less equally into regular and 
chaotic regions. At high energies the section is dominated by a 
single chaotic zone. We will see that such transitions from regular 
to chaotic behavior are quite common; similar phenomena occur 
in widely different systems, though the details naturally depend 
on the system under study. 


3.6.4 Non-axisymmetric Top 245 


3.6.4 Non-axisymmetric Top 


0.005 


Figure 3.22 A surface of section for the non-axisymmetric top. The 
parameters are A = 0.0003kg m?, B = 0.00025kg m?, C = 0.0001kg m?, 
gMR = 0.0456kg m?s~?. The energy and Dy are those of the top initially 
standing vertically with rotation frequency 30 rad/s. The angle @ is on 
the abscissa, and the momentum pg is on the ordinate. 


We have seen that the motion an axisymmetric top can be es- 
sentially solved. A plot of the rate of change of the tilt angle 
versus the tilt angle is a simple closed curve. The evolution of 
the other angles describing the configuration can be obtained by 
quadrature once the tilting motion has been solved. Now let’s 
consider a non-axisymmetric top. A non-axisymmetric top is a 
top with three unequal moments of inertia. The pivot is not at 
the center of mass so uniform gravity exerts a torque. We assume 
the line between the pivot and the center of mass is one of the 
principal axes, which we take to be ĉ. There are no torques about 
the vertical axis, so the vertical component of the angular mo- 
mentum is conserved. If we write the Hamiltonian in terms of the 
Euler angles, the angle y, which corresponds to rotation about 
the vertical, does not appear. Thus the momentum conjugate to 
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this angle is conserved. The non-trivial degrees of freedom are 0 
and w, with their conjugate momenta. 

We can make a surface of section (see figure 3.22) for this prob- 
lem by displaying pg versus 6 when yw = 0. There are in general 
two values of py possible for given values of energy and py. We 
plot points only if the value of py at the crossing is the larger of the 
two possibilities. This makes the points of the section correspond 
uniquely to a trajectory. 

In this section there is a large quasiperiodic island surrounding 
a fixed point that corresponds to the tilted equilibrium point of 
awake axisymmetric top (see figure 3.7). Surrounding this is a 
large chaotic zone that extends from 0 = 0 to angles near m. If 
this top is placed initially near the vertical it exhibits chaotic 
motion that carries it to large tilt angles. If the top is started 
within the quasiperiodic island the tilt is stable. 


3.7 Exponential Divergence 


Hénon and Heiles discovered that the chaotic trajectories had 
remarkable sensitivity to small changes in initial conditions— 
initially nearby chaotic trajectories separate roughly exponen- 
tially with time. On the other hand, regular trajectories do not 
exhibit this sensitivity—initially nearby regular trajectories sepa- 
rate roughly linearly with time. 

Consider the evolution of two initially nearby trajectories for 
the Hénon-Heiles problem, with energy E = 1/8. Let d(t) be the 
usual Euclidean distance in the x, y, Pr, py space between the two 
trajectories at time t. Figure 3.23 shows the common logarithm 
of d(t)/d(0) as a function of time t. We see that the divergence is 
well described as exponential. 

On the other hand, the distance between two initially nearby 
regular trajectories grows much more slowly. Figure 3.24 shows 
the distance between two regular trajectories as a function of time. 
The distance grows linearly with time. 

It is remarkable that Hamiltonian systems have such radically 
different types of trajectories. On the surface of section the chaotic 
and regular trajectories differ by the dimension of the space that 
they explore. It is very interesting that along with this dimen- 
sional difference there is a drastic difference in the way chaotic 
and regular trajectories separate. For higher dimensional systems 
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Figure 3.23 The common logarithm of the phase-space distance be- 
tween two chaotic trajectories divided by the initial phase-space distance 
as a function of time. The initial distance was 10~!°. The logarithm of 
the distance grows approximately linearly; the distance grows exponen- 
tially. The two trajectory method saturates when the distance between 
trajectories becomes comparable to that allowed by conservation of en- 
ergy. Also displayed is the distance between trajectories calculated by 
integrating the linearized variational equations. This method does not 
saturate. 


the surface of technique is not as useful, but trajectories are still 
distinguished by the way neighboring trajectories diverge: some 
diverge exponentially whereas others diverge approximately lin- 
early. Exponential divergence is the hallmark of chaotic behavior. 
The rate of exponential divergence is quantified by the slope 
of the graph of log(d(t)/d(0)). We can estimate the rate of ex- 
ponential divergence of trajectories from a particular phase-space 
trajectory ø by choosing a nearby trajectory o’ and computing 


— log(d(t)/d(to)) 
t—to 


y(t) ; (3.136) 
where d(t) = ||o'(t) — o(t)||. A problem with this method, the 
“two-trajectory” method, is illustrated in figure 3.23. For strongly 


chaotic trajectories two initially nearby trajectories soon find 
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Figure 3.24 The phase-space distance between two regular trajecto- 
ries divided by the initial phase-space distance as a function of time. 
The initial distance was 10~!°. The distance grows linearly. 


themselves as far apart as they can get. Once this happens the 
distance no longer grows. The estimate of the rate of divergence 
of trajectories is limited by this “saturation.” 

We can improve on this method by studying a variational sys- 
tem of equations. Let 


Dz(t) = F(t, 2(t)) (3.137) 


be the system of equations governing the evolution of the system. 
A nearby trajectory z’ satisfies 


Dz'(t) = F(t, z (t)). (3.138) 
The difference between these trajectories ¢ = 7’ — z satisfies 

DEC) = F(t, z (t))— F(t, z(t)) = F(t, z(£)+¢(t))— F(t, z(t)).(3.139) 
If ¢ is small we can approximate the right-hand side by a derivative 


DC(t) = A F(t, z(t))C(t). (3.140) 
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This set of ordinary differential equations is called the variational 
equations for the system. It is linear in ¢, and driven by z. 

Let d(t) = ||¢(#)||, then the rate of divergence can be estimated 
as before. The advantage of this “variational method” is that 
w can become arbitrarily large and its growth still measures the 
divergence of nearby trajectories. We can see in figure 3.23 that 
the variational method gives nearly the same result as the two- 
trajectory method up to the point at which the two-trajectory 
method saturates.” 

The Lyapunov exponent is defined to be the infinite time limit 
of y(t), defined by equation (3.136), in which the distance d is 
computed by the variational method. Actually, for each trajec- 
tory there are many Lyapunov exponents, depending on the ini- 
tial direction of the variation ¢. For an N dimensional system, 
there are N Lyapunov exponents. For a randomly chosen ¢(to) 
the subsequent growth of ¢ has components that grow with each 
of the Lyapunov exponents. In general, however, the growth of w 
will be dominated by the largest exponent. The largest Lyapunov 
exponent thus has the interpretation as the typical rate of expo- 
nential divergence of nearby trajectories. The sum of the largest 
two Lyapunov exponents can be interpreted as the typical rate 
of growth of the area of two-dimensional elements. This can be 
extended to higher dimensional elements. The rate of growth of 
volume elements is the sum of all the Lyapunov exponents. 

For Hamiltonian systems there are constraints that the Lya- 
punov exponents must satisfy, which we will justify later. Lya- 
punov exponents come in pairs: For every Lyapunov exponent A 
its negation — is also an exponent. For every conserved quan- 
tity, one of the Lyapunov exponents (and its negation) is zero. So 
the Lyapunov exponents can be used to check for the existence 
of conserved quantities. The sum of the Lyapunov exponents for 
a Hamiltonian system is zero, so volume elements do not expo- 
nentially grow. We will see in the next section that phase-space 
volume is actually conserved for Hamiltonian systems. 


32T strongly chaotic systems w may become so large that the computer can 
no longer represent it. To prevent this we can replace w by w/c whenever the 
size of w becomes uncomfortably large. The equation governing w is linear 
so, except for the scale change, the evolution is unchanged. Of course we have 
to keep track of these scale changes when computing the average growth rate. 
This process is called “renormalization” to make it sound impressive. 
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3.8 Liouville’s Theorem 


If an ensemble of states occupies a particular volume of phase 
space at one moment, then the subsequent evolution of that vol- 
ume by the flow described by Hamilton’s equations may distort 
the ensemble but it does not change the volume the ensemble oc- 
cupies. That phase-space volume is preserved by the phase flow 
is called Liouville’s Theorem. 

We will first illustrate the preservation of phase-space volume 
with a simple example and then prove it in general. 


The phase flow for the pendulum 
Consider an undriven pendulum described by the Hamiltonian: 


2 
H(t, 0,p9) = eae + glm cos 6. (3.141) 
In figure 3.25 we see the evolution of an elliptic region around a 
point on the 6-axis, in the oscillation region of the pendulum. 
Three later positions of the region are shown. The region is 
stretched and sheared by the flow, but the area is preserved. After 
many cycles, the starting region will be stretched to be a thin layer 
distributed in the phase angle of the pendulum. Figure 3.26 shows 
a similar evolution (for smaller time intervals) of a region strad- 
dling the separatrix*? near the unstable equilibrium point. The 
phase-space region rapidly stretches along the separatrix, while 
preserving the area. The initial conditions that start in the oscil- 
lation region (inside of the separatrix) will continue to spread into 
a thin ring-shaped region, while the initial conditions that start 
outside of the separatrix will spread into a thin region of rotation 
on the outside of the separatrix. 


Proof of Liouville’s theorem 
Consider a set of ordinary differential equations of the form 


Dz(t) = F(t, z(t)), (3.142) 


where z is a tuple of N state variables. Let R(t1) be a region of 
the state space at time tı. Each element of this region is an initial 


33The separatrix is the curve that separates the oscillating motion from the 
circulating motion. It is made up of several trajectories that are asymptotic 
to the unstable equilibrium. 
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Figure 3.25 A swarm of initial points outlining an area in the phase 
space of the pendulum deforms as it evolves, but the area contained in 
the contour remains constant. The horizontal axis is the angle of the 
pendulum from the vertical. The vertical axis is the angular momentum. 
The initial contour is the “ellipse” on the abscissa. The pendulum has 
length 1 meter in standard gravity (9.8 meter/second?), so its period is 
approximately 2 seconds. The flow proceeds clockwise and the deformed 
areas are shown at .9 seconds, 1.8 seconds, and 2.7 seconds. The succes- 
sive positions exhibit “shearing” of the region due to the fact that the 
pendulum is not isochronous. 


condition at time tı for the system. Each element evolves to an 
element at time tz according to the differential equations. The set 
of these elements at time tə is the region R(t2). Regions evolve to 
regions. 

The evolution of the system for a time interval At defines a 
map giat from the state space to itself: 


g,at(z(t)) = z(t + At). (3.143) 
Regions map to regions by mapping each element in the region: 


g atl R) = R(t + At). (3.144) 
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Figure 3.26 The pendulum here is the same as in the previous figure, 
but now the swarm of initial points surrounds the unstable equilibrium 
point for the pendulum in phase space, where 0 = m and pọ = 0. The 
swarm is stretched out along the separatrix. The time interval between 
successively plotted contours is 0.3 seconds. 


The volume V(t) of a region R(t) is Sree) 1. The volume of the 
evolved region R(t + At) is 


V(t-+ At) -f 1 


R(t+At) 


= f 1 
ora (RO) 


= f A (3.145) 
R(t) 


where Jac(g,az) is the Jacobian of the mapping gaz. The Jaco- 
bian is the determinant of the derivative of the mapping. 
For small At 


gt,at(z(t)) = z(t) + AtF(t, z(t)) + o(A??), (3.146) 
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so 

Dgt,ar(z(t)) = 1 + Atâ, F(t, 2(t)) + o(At?). (3.147) 
We can use the fact that if A is an N x N square matrix then 
det(1 +A) =1+ tr A+ o(c’) (3.148) 


to show that 


Jac(gt,az)(z) = 1 + AtG,(z) + of At’), (3.149) 
where 
Gi(z) = tr(O, F(t, z)). (3.150) 
Thus 
V(t+ At) = ye [1 + AtG, + o(At°)] 
=V(t)+ ar f Gi +0o(At°). (3.151) 
R(t) 


So the rate of change of the volume at time t is 
DV(t) = I Gh. (3.152) 
R(t) 


Now we compute G; for a system described by a Hamiltonian 
H. The components of z are the components of the coordinates 
and the momenta: z* = q}, z*+” = pp for k =0,...,n—1. The 
components of F are 


F*(t,z) = (32H)"(t,q, p) 


F(t, z) = —(3 H )k(t, q, p), (3.153) 
for k = 0,...,n — 1. The diagonal components of the derivative 
0. F are 

(O1)nF*(t, 2) = (31 )k(32)" H(t, q, p) 
(Aent a= Op: (3.154) 


The component partial derivatives commute, so the diagonal com- 
ponents with index k and index k +n are equal and opposite. We 
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see that the trace, which is the sum of these diagonal components, 
is zero. Thus the integral of G over the region R(t) is zero, so the 
derivative of the volume at time t is zero. Because t is arbitrary, 
the volume does not change. This proves Liouville’s theorem: the 
phase-space flow conserves phase-space volume. 

Notice that the proof of Liouville’s theorem does not depend 
upon whether the Hamiltonian has explicit time dependence. Li- 
ouville’s theorem holds for systems with time-dependent Hamil- 
tonians. 

We may think of the ensemble of all possible states as a fluid 
flowing around under the control of the dynamics. Liouville’s theo- 
rem says that this fluid is incompressible for Hamiltonian systems. 


Exercise 3.11: Determinants and traces 


Show that equation (3.148) is correct. 


Area preservation of stroboscopic surfaces of 
section 
Surfaces of section for periodically driven Hamiltonian systems 
are area preserving if the section coordinates are the phase space 
coordinate and momentum. This is an important feature of sur- 
faces of section. It is a consequence of Liouville’s theorem for one 
degree of freedom problems. 

It is also the case that surfaces of section such as those we have 
used for the Hénon-Heiles problem are area preserving, but we are 
not ready to prove this yet! 


Poincaré recurrence 

There is a remarkable theorem which is a trivial consequence of 
Liouville’s theorem—the Poincaré recurrence theorem. Loosely, 
the theorem states that almost all trajectories eventually return 
arbitrarily close to where they started. This is true regardless of 
whether the trajectories are chaotic or regular. 

More precisely, consider a Hamiltonian dynamical system for 
which the phase space is a bounded domain D. We identify some 
initial point in the phase space, say, zo. Then, for any finite 
neighborhood U of zg that we choose, there are trajectories which 
emanate from initial points in that neighborhood that eventually 
return to the neighborhood. 

We can prove this by considering the successive images of U 
under the time evolution. For simplicity, we restrict consideration 
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to time evolution for a time interval A. The map of the phase 
space onto itself generated by time evolution for an interval A 
we call C. Subsequent applications of the map generate a discrete 
time evolution. Sets of points in phase space transform by evolving 
all the points in the set; the image of the set U is denoted C(U). 
Now consider the trajectory of the set U, that is, the sets C"(U) 
where C” indicates the n times composition of C. Now there are 
two possibilities: either the successive images C’(U) intersect or 
they do not. If they do not intersect, then with each iteration, a 
volume of D equal to the volume of U gets “used up” and cannot 
belong to the further image. But the volume of D is finite, so 
we cannot fit an infinite number of non-intersecting finite volumes 
into it. Therefore, after some number of iterations the images 
intersect. Suppose, C’(U) intersects with CJ(U), with j < i, 
for definiteness. Then the preimage of each must also intersect, 
since the preimage of a point in the intersection belongs to both 
sets. Thus C*~!(U) intersects CJ~'(U). This can be continued 
until finally we have C’-/(U) intersects U. So we have proven 
that after i — j iterations of the map C there are a set of points 
initially in U that return to the neighborhood U. 


The gas in the corner of the room 

Suppose we have a collection of N classical atoms in a perfectly 
sealed room. The phase-space dimension of this system is 6N. A 
point in this phase space is denoted z. Suppose initially all the 
atoms are, say, within one centimeter of one corner, with arbi- 
trarily chosen finite velocities. This corresponds to some initial 
point zọ in the phase space. The phase space of the system is 
limited in space by the box, and in momentum by energy conser- 
vation; the phase space is bounded. The recurrence theorem then 
says that in the neighborhood of zo there is an initial condition 
of the system that returns to the neighborhood of zg after some 
time. For the individual atoms this means that after some time 
all of the atoms will be found in the corner of the room again, 
and again, and again. Makes one wonder about the second law of 
thermodynamics, doesn’t it?*4 


341¢ is reported that when Boltzmann was confronted with this problem he 
responded, “You should wait that long!” 
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Non-existence of attractors in Hamiltonian systems 
Some systems have attractors. An attractor is a region of phase 
space that gobbles volumes of trajectories. For an attractor there 
is some larger region, the basin of attraction, such that sets of tra- 
jectories with non-zero volume eventually end up in the attractor 
and never leave it. The recurrence theorem shows that Hamilto- 
nian systems with bounded phase space do not have attractors. 
Consider some candidate volume in the proposed basin of attrac- 
tion. The recurrence theorem guarantees that some trajectories in 
the candidate volume return to the volume repeatedly. Therefore, 
the volume is not in a basin of attraction. Attractors do not exist 
in Hamiltonian systems with bounded phase space. 

This does not mean that every trajectory always returns. A 
simple example is the pendulum. Suppose we take a blob of tra- 
jectories that spans the separatrix, the trajectory that asymp- 
totically approaches the unstable equilibrium with the pendulum 
pointed up. Trajectories with more energy than the separatrix 
make a full loop around and return to their initial point; trajecto- 
ries with lower energy than the separatrix oscillate once across and 
back to their initial position; but the separatrix trajectory itself 
leaves the initial region permanently, and continually approaches 
the unstable point. 


Conservation of phase volume in a dissipative system 
The definition of a dissipative system is not so clear. For some, 
“dissipative” implies that phase-space volume is not conserved, 
which is the same as saying the evolution of the system is not 
governed by Hamilton’s equations. For others, “dissipative” im- 
plies friction is present: representing loss of energy to unmodelled 
degrees of freedom. Here is a curious example. The damped har- 
monic oscillator is the paradigm of a dissipative system. Here we 
show that the damped harmonic oscillator can be described by 
Hamilton’s equations and that phase-space volume is conserved. 

The damped harmonic oscillator is governed by the ordinary 
differential equation 


mD?’z+aDzr + kr =0 (3.155) 
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where qa is a coefficient of damping. We can formulate this system 


with the Lagrangian *° 
k a 
L(t,2,%) = (Ge — Seem". (3.156) 


The Lagrange equation for this Lagrangian is 
(mD?z +aDz+ kr)e™t = 0. (3.157) 


Since the exponential is never zero this equation has the same 
trajectories as equation (3.155) above. 
The momentum conjugate to x is 


p= mter", (3.158) 
and the Hamiltonian is 


Hippie ae (3.159) 

2m 
For this system, the Hamiltonian is not the sum of the kinetic 
energy of the motion of the mass and the potential energy stored 
in the spring. The value of the Hamiltonian is not conserved 
(oH # 0). Hamilton’s equations are 


Dat) = Pp) ae 
Dp(t) = —ka(t)em’. (3.160) 


Let’s consider a numerical case. Let m=5, k= 1/4, a =3. 
Here the characteristic roots of the linear constant-coefficient or- 
dinary differential equation (3.155) are s = —1/10,—1/2. Thus 
the solutions are 


Ba = per fa | Fa (3.161) 


for A, and Ag determined by the initial conditions: 


E EENI (3.162) 


1 
2 2 


35This is just the product of the Lagrangian for the undamped harmonic 
oscillator with an increasing exponential of time. 


258 Chapter 3 Hamiltonian Mechanics 


Thus we can form the transformation from the initial state to the 
final state: 


—it ezt Siih 
Belger esll Je) Bol e 


The transformation is linear, so the area is transformed by the 
determinant, which is 1 in this case. Thus, contrary to intu- 
ition, the phase-space volume is conserved. So why is this not 
a contradiction with the statement that there are no attractors 
in Hamiltonian systems? The answer is that the Poincaré recur- 
rence argument is only true for bounded phase spaces. Here, the 
momentum expands exponentially with time (as the coordinate 
contracts), so it is unbounded. 

We shouldn’t really be too surprised by the way the theory 
protects itself from an apparent paradox—that the phase volume 
is conserved even though all trajectories decay to zero velocity 
and coordinates. The proof of Liouville’s theorem allows for time- 
varying Hamiltonians. In this case we are able to model the dis- 
sipation by just such a time-varying Hamiltonian. 


Exercise 3.12: Time-varying systems 


To make the fact that Liouville’s theorem holds for time-varying sys- 
tems even more concrete, extend the results of section 3.8 to show how 
a swarm of initial points outlining an area in the phase space of the 
driven pendulum deforms as it evolves. Construct pictures analogous 
to figures 3.25 and 3.26 for one of the interesting cases where we have 
surfaces of section. Does the distortion look different in different parts 
of the phase space? How? 


Distribution functions 

We only know the state of a system approximately. It is reasonable 
to model our state of knowledge by a probability density function 
on the set of possible states. Given such incomplete knowledge, 
what are the probable consequences? As the system evolves, the 
density function also evolves. Liouville’s theorem gives us a handle 
on this kind of problem. 

Let f(t,q,p) be a probability density function on the phase 
space at time t. For this to be a good probability density function 
we require that the integral of f over all coordinates and momenta 
is 1—that the system is somewhere is certain. 
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There is a set of trajectories that pass through any particular 
region of phase space at a particular time. These trajectories are 
neither created nor destroyed, and they proceed as a bundle to 
another region of phase space at a later time. Liouville’s theo- 
rem tells us that the volume of the source region is the same as 
the volume of the target region, so the density must remain con- 
stant. Thus D(f oa) =0. If we have a system described by the 
Hamiltonian H then 


D( foo) =Odfoot+{f, H}oo. (3.164) 
so we may conclude that 
Oofoot+{f,H}oa=0. (3.165) 


This linear partial differential equation governs the evolution of 
the density function, and thus shows how our state of knowledge 
evolves. 


3.9 Standard Map 


We have seen that the surfaces of section for a number of different 
problems are qualitatively very similar. They all show two qual- 
itatively different types of motion: regular motion and chaotic 
motion. They show that these types of orbits are clustered; there 
are regions of the surface of section which have mostly regular 
trajectories and other regions dominated by chaotic behavior. We 
have also seen a transition to large-scale chaotic behavior as some 
parameter is varied. Now we have learned that the map that takes 
points on a two-dimensional surface of section to new points on the 
surface of section is area-preserving. The sole property that these 
maps of the section onto itself have in common (that we know of 
at this point) is that they preserve area. Otherwise they are quite 
distinct. Suppose we consider an abstract map of the section onto 
itself that is area-preserving, without regard for whether the map 
is generated by some dynamical system. Do area-preserving maps 
show similar phenomena, or is the dynamical origin of the map 
crucial to the phenomena we have found?*® 


36This question was also addressed in the remarkable paper by Hénon and 
Heiles, but with a different map than we use here. 
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Consider a map of the phase plane onto itself defined in terms 
of the dynamical variables 0 and its “conjugate momentum” T. 
The map is 


I’ = (I + Ksin@) mod 27 (3.166) 
6’ = (0 + I’) mod 2r. (3.167) 


This map is known as the “standard map.”°" A curious feature of 
the standard map is that the momentum variable J is treated as 
an angular quantity. The derivative of the map has determinant 
one, implying the map is area preserving. 

We can implement the standard map: 


(define ((standard-map K) theta I return failure) 
(let ((nI (+ I (* K (sin theta))))) 
(return ((principal-value 2pi) (+ theta nI)) 
((principal-value 2pi) nI)))) 


We use the explore-map procedure introduced earlier to use a 
pointing device to interactively explore the surface of section. For 
example, to explore the surface of section for parameter K = 0.6 
we use: 


(define window (frame 0.0 2pi 0.0 2pi)) 
(explore-map window (standard-map 0.6) 2000) 


The resulting surface of section, for a variety of orbits chosen 
with the pointer are shown in figure 3.27 The surface of section 
does indeed look qualitatively similar to the surfaces of section 
generated by dynamical systems. 

The surface of section for K = 1.4 (as shown in figure 3.28) is 
dominated by a large chaotic zone. The standard map exhibits a 
transition to large-scale chaos near K = 1. So this abstract area- 
preserving map of the phase plane onto itself shows behavior that 
is similar to behavior in the sections generated by a Hamiltonian 
dynamical system. Evidently, the area preservation property of 
the dynamics in the phase space plays a determining role for many 
interesting properties of trajectories of mechanical systems. 


37The standard map has been extensively studied. Early investigations were 
by Chirikov [11] and by Taylor [41]. So the map is sometimes called the 
Chirikov-Taylor map. Chirikov coined the term “standard map,” which we 
adopt. 
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2T pe 


0 T 27 


Figure 3.27 Surface of section for the standard map for K = 0.6. The 
section shows mostly regular trajectories, with a few dominant islands, 
but also shows a number of small chaotic zones. 


Exercise 3.13: Fun with Henon’s quadratic map 


Consider the map of the plane defined by the equations: 
x’ = xcosa — (y— x°) sina 


y' = zsina + (y — x°) cosa 


a. Show that the map preserves area. 


b. Implement the map as a procedure. The interesting range of x and y 
is (—1, 1). There will be orbits that escape. You should check for values 
of x and y that escape from this range and call the fail continuation 
when this occurs. 


c. Explore the phase portrait of this map for a few values of the param- 
eter a. The map is particularly interesting for a = 1.32 and a = 1.2. 
What happens in between? 
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0 T 27 


Figure 3.28 Surface of section for the standard map for K = 1.4. 
The dominant feature is a large chaotic zone. There are also some large 
islands of regular behavior. In this case there are also some interesting 
secondary islands - islands around islands. 


3.10 Summary 


Lagrange’s equations are a system of n second order ordinary dif- 
ferential equations in the time, the generalized coordinates, the 
generalized velocities, and the generalized accelerations. Trajec- 
tories are determined by the coordinates and the velocities at a 
moment. 

Hamilton’s equations specify the dynamics as a system of first- 
order ordinary differential equations in the time, the generalized 
coordinates, and the conjugate momenta. Phase-space trajectories 
are determined by an initial point in phase space at a moment. 

The Hamiltonian formulation and the Lagrangian formulation 
are equivalent in that equivalent initial conditions produce the 
same configuration path. 

If there is asymmetry of the problem that is naturally expressed 
as a cyclic coordinate, then the conjugate momentum is conserved. 
In the Hamiltonian formulation, such a symmetry naturally results 
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in the reduction of the dimension of the phase space of the difficult 
part of the problem. If there are enough symmetries, then the 
problem of determining the time evolution may be reduced to 
evaluation of definite integrals (reduced to quadratures). 

Systems without enough symmetries to be reducible to quadra- 
tures may be effectively studied with the surface of section tech- 
nique. This is particularly advantageous in systems for which the 
reduced problem has two degrees of freedom or has one degree of 
freedom with explicit periodic time dependence. 

Surfaces of section reveal tremendous structure in the phase 
space. There are chaotic zones and islands of regular behavior. 
There are interesting transitions as parameters are varied between 
mostly regular motion to mostly chaotic motion. 

Chaotic trajectories exhibit sensitive dependence on initial con- 
ditions, separating exponentially from nearby trajectories. Reg- 
ular trajectories do not show such sensitivity. Curiously, chaotic 
trajectories are distinguished both by the dimension of the space 
they explore and by their exponential divergence. 

The time evolution of a 2n-dimensional region in phase space 
preserves the volume. Hamiltonian flow is “incompressible” flow 
of the “phase fluid.” 

Surfaces of section for two degree of freedom systems and 
for periodically driven one degree of freedom systems are area- 
preserving. Abstract area-preserving maps of a phase plane onto 
itself show the same division of the phase space into chaotic and 
regular regions as surfaces of section generated by dynamical sys- 
tems. They also show transitions to large-scale chaos. 


3.11 Projects 


Exercise 3.14: Periodically driven pendulum 


Explore the dynamics of the driven pendulum, using the surface of sec- 
tion method. We are interested in exploring the regions of parameter 
space over which various phenomena occur. Consider a pendulum of 
length 9.8m, mass 1kg, and acceleration of gravity g = 9.8ms~?, giv- 
ing wo = lrad/s. Explore the parameter plane of the amplitude A and 
frequency w of the periodic drive. 

Examples of the phenomena to be investigated: 


a. Inverted equilibrium. Show the region of parameter space (A,w) in 
which the inverted equilibrium is stable. If the inverted equilibrium is 
stable there is some range of stability, i.e. there is a maximum angle 
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of displacement from the equilibrium that stable oscillations reach. If 
you have enough time, plot contours in the parameter space for different 
amplitudes of the stable region. 


b. Period doubling of the normal equilibrium. For this case, plot the 
angular momenta of the stable and unstable equilibria as functions of 
the frequency for some given amplitude. 


c. Transition to large-scale chaos. Show the region of parameter space 
(A,w) for which the chaotic zones around the three principal resonance 
islands are linked. 


Exercise 3.15: Spin-orbit surfaces of section 


Write a program to compute surfaces of section for the spin-orbit prob- 
lem, with the section points being recorded at pericenter. Investigate 
the following: 


a. Give a Hamiltonian formulation of the spin-orbit problem introduced 
in section 2.11.2. 


b. For out-of-roundness parameter € = 0.1 and eccentricity e = 0.1 
measure the widths of the regular islands associated with the 1:1, 3:2, 
and 1:2 resonances. 


c. Explore the surfaces of section for a range of e for fixed e = 0.1. 
Estimate the critical value of € above which the main chaotic zones 
around the 3:2 and the 1:1 resonance islands are merged. 


d. For a fixed eccentricity e = 0.1 trace the location on the surface of 
section of the stable and unstable fixed points associated with the 1:1 
resonance as a function of the out-of-roundness e. 


A 


Phase Space Structure 


When we try to represent the figure formed by 
these two curves and their intersections in a finite 
number, each of which corresponds to a doubly 
asymptotic solution, these intersections form a 
type of trellis, tissue, or grid with infinitely 
serrated mesh. Neither of these two curves must 
ever cut across itself again, but it must bend back 
upon itself in a very complex manner in order to 
cut across all of the meshes in the grid an infinite 
number of times. 

The complexity of this figure will be striking, and I 
shall not even try to draw it. 


Henri Poincaré New Methods of Celestial 
Mechanics, volume III, Chapter XXXIII, Section 
397, (1892). 


We have seen rather complicated features appear as part of the 
Poincaré sections of a variety of systems. We have seen fixed 
points, invariant curves, resonance islands, and chaotic zones in 
such diverse systems as the driven pendulum, the non-axisymmetric 
top, the Hénon-Heiles system, and the spin-orbit coupling of a 
satellite. Indeed, even in the standard map, where there is no 
continuous process sampled by the surface of section, the phase 
space shows similar features. 

The motion of other systems is simpler. For some systems con- 
served quantities can be used to reduce the solution to the eval- 
uation of definite integrals. An example is the axisymmetric top. 
Two symmetries imply the existence of two conserved momenta, 
and time independence of the Hamiltonian implies energy conser- 
vation. Using these conserved quantities, determining the motion 
is reduced to the evaluation of definite integrals of the periodic 
motion of the tilt angle as a function of time. Such systems do 
not exhibit chaotic behavior; on a surface of section the conserved 
quantities constrain the points to fall on curves. We may conjec- 
ture that if points on a surface of section do not apparently fall 
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on curves then a sufficient number of conserved quantities do not 
exist to reduce the solution to quadratures. 

We have seen a number of instances in which the behavior of a 
system changes qualitatively as additional effects are added. The 
free rigid body can be reduced to quadratures, but the addition 
of gravity gradient torques in the spin-orbit system yields the fa- 
miliar mixture of regular and chaotic motions. The motion of an 
axisymmetric top is also reducible to quadratures, but if the top is 
made non-axisymmetric then the mixed phase space appears. The 
system studied by Hénon and Heiles, with the classic mixed phase 
space, can be thought of as a solvable pair of harmonic oscillators 
with non-linear coupling terms. The pendulum is solvable, but 
the driven pendulum has the mixed phase space. 

We observe that, as additional effects are turned on, qualita- 
tive changes occur in the phase space. Resonance islands appear, 
chaotic zones appear, some invariant curves disappear, but oth- 
ers persist. Why do resonance islands appear? How does chaotic 
behavior arise? When do invariant curves persist? Can we draw 
any general conclusions? 


4.1 Emergence of the Mixed Phase Space 


We can get some insight into these qualitative changes of behavior 
by considering systems in which the additional effects are turned 
on by varying a parameter. For some value of the parameter 
the system has a sufficient number of conserved quantities to be 
reducible to quadratures; as we vary the parameter away from 
this value we can study how the mixed phase space appears. The 
driven pendulum offers a archetypal example of such a system. 
If the amplitude of the drive is zero, then solutions of the driven 
pendulum are the same as the solutions of the undriven pendulum. 
We have seen surfaces of section for the strongly driven pendulum, 
illustrating the mixed phase space. Here we crank up the drive 
slowly and study how the phase portrait changes. 

The motion of the driven pendulum with zero amplitude drive 
is the same as that of an undriven pendulum. The motion of a 
pendulum was described in section 3.3. Energy is conserved, so 
all orbits are level curves of the Hamiltonian in the phase plane 
(see figure 4.1). There are three regions of the phase plane that 
have qualitatively different types of motion: the region in which 
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20 


Figure 4.1 The phase plane of the pendulum has three regions dis- 
playing two distinct kinds of behavior. Trajectories lie on the contours 
of the Hamiltonian. Trajectories may oscillate, making ovoid curves 
around the equilibrium point, or they may circulate, producing wavy 
tracks outside the eye-shaped region. The eye-shaped region is delimited 


by the separatrix. This pendulum has length 1m, and the acceleration 


of gravity is 9.8ms~?. 


the pendulum oscillates, the region in which the pendulum circu- 
lates in one direction, and the region of circulation in the other 
direction. In the center of the oscillation region there is a stable 
equilibrium, at which the pendulum is hanging motionless. At 
the boundaries between these regions the pendulum is asymptotic 
to the unstable equilibrium, at which the pendulum is standing 
upright. There are two asymptotic trajectories, corresponding to 
the two ways the equilibrium can be approached. Each of these 
is also asymptotic to the unstable fixed point going backward in 
time. 


Driven pendulum sections with zero drive 

Now consider the periodically driven pendulum, but with zero- 
amplitude drive. The state of the driven pendulum is specified 
by an angle coordinate, its conjugate momentum, and the phase 
of the periodic drive. With zero-amplitude drive the evolution of 
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Figure 4.2 A surface of section for the driven pendulum, with zero- 
amplitude drive. The effect is to sample the trajectories of the undriven 
pendulum, which lie on the contours of the Hamiltonian. Only a small 
number of points are plotted for each trajectory to illustrate the fact that 
for zero-amplitude drive the surface of section samples the continuous 
trajectories of the undriven pendulum. 


“driven” pendulum is the same as the undriven pendulum. The 
phase of the drive does not affect the evolution, but we consider 
the phase of the drive as part of the state so we can give a uniform 
description that allows us to include the zero-amplitude drive case 
with the non-zero amplitude case. 

For the driven pendulum we make stroboscopic surfaces of sec- 
tion by sampling the state at the drive period, and plotting the 
angular momentum versus the angle (see figure 4.2). For zero- 
amplitude drive, the section points are confined to the curves 
traced by trajectories of the undriven pendulum. For each kind 
of orbit that we saw in the one degree of freedom problem, there 
are orbits of the driven pendulum that generate a corresponding 
pattern of points on the section. 

The two stationary orbits at the equilibrium points of the pen- 
dulum appear as points on the surface of section. Equilibrium 
points are fixed points of the Poincaré map. 
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Section points for the oscillating orbits of the pendulum fall on 
the corresponding contour of the Hamiltonian. Section points for 
the circulating orbits of the pendulum are likewise confined to the 
corresponding contour of the Hamiltonian. We notice that the 
appearance of the points generated by orbits on different contours 
is different. Typically, if we collected more points on the surface of 
section the points would eventually fill in the contours. However, 
there are actually two possibilities. Remember that the period of 
the pendulum is different for different trajectories. If the period 
of the pendulum is commensurate with the period of the drive 
then only a finite number of points will appear on the section. 
Two periods are commensurate if one is a rational multiple of the 
other. If the two periods are incommensurate then the section 
points never repeat. In fact, the points fill the contour densely, 
coming arbitrarily close to every point on the contour. 

Section points for the asymptotic trajectories of the pendulum 
fall on the contour of the Hamiltonian containing the saddle point. 
Each asymptotic orbit generates a sequence of isolated points that 
accumulate near the fixed point. No individual orbit fills the sep- 
aratrix on the section. 


Driven pendulum sections for small drive 

Now consider the surface of section for small drive amplitude (see 
figure 4.3). The amplitude of the drive is A = 0.001m; the drive 
frequency is 4.2w 9, where wo = J/g/l. The overall appearance of 
the surface of section is similar to the section with zero-amplitude 
drive. Many orbits appear to lie on invariant curves similar to the 
invariant curves of the zero-drive case. However, there are several 
new features. 

There are now resonance regions that correspond to the pen- 
dulum rotating in lock with the drive. These features are found 
in the upper and lower circulating region of the surface of section. 
Each island has a fixed point for which the pendulum rotates ex- 
actly once per cycle of the drive. In general, fixed points on the 
surface of section correspond to periodic motions of the system 
in the full phase space. The fixed point is at +7, indicating that 
the pendulum is vertical at the section phase of the drive. For or- 
bits in the resonance region away from the fixed point the points 
on the section apparently generate curves that surround the fixed 
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Figure 4.3 A surface of section for the driven pendulum, with non- 
zero drive amplitude A = 0.001m and drive frequency 4.2w . Many tra- 
jectories apparently generate invariant curves, as in the zero-amplitude 
drive case. Here, in addition, some orbits belong to island chains and 
others are chaotic. The most apparent chaotic orbit is near the separa- 
trix of the undriven pendulum. 


point.! For these orbits the pendulum rotates on average once per 
drive, but the phase of the pendulum is sometimes ahead of the 
drive and sometimes behind it. 

There are other islands that appear with non-zero amplitude 
drive. In the central oscillation region there is a six-fold chain 
of secondary islands. For this orbit the pendulum is oscillating, 
and the period of the oscillation is commensurate with the drive. 
The six islands are all generated by a single orbit. In fact, the 
islands are visited successively in a clockwise direction. After six 
cycles of the drive the section point returns to the same island 
but falls at a different point on the island curve, accumulating the 
island curve after many iterations. The motion of the pendulum 
is not periodic, but is locked in a resonance so that on average it 
oscillates once for every six cycles of the drive. 


‘Keep in mind that the abscissa is an angle. 
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Another feature which appears is a narrow chaotic region near 
where the separatrix was in the zero-amplitude drive pendulum. 
We find that chaotic behavior typically makes its most prominent 
appearance near separatrices. This is not surprising because the 
difference in velocities that distinguish whether the pendulum ro- 
tates or oscillates is small for orbits near the separatrix. As the 
pendulum approaches the top, whether it receives the extra nudge 
it needs to go over the top depends on the phase of the drive. 

Actually, the apparent separatrices of the resonance islands for 
which the pendulum period is equal to the drive period are each 
generated by a chaotic orbit. To see that this orbit appears to 
occupy an area one would have to magnify the picture by about 
a factor of 104. 

As the drive amplitude is increased the main qualitative changes 
are the appearance of resonance islands and chaotic zones. Some 
qualitative characteristics of the zero-case remain. For instance 
many orbits appear to lie on invariant curves. This behavior is 
not particular to the driven pendulum; similar features quite gen- 
erally arise as additional effects are added to problems that are 
reducible to quadratures. This chapter is devoted to understand- 
ing in greater detail how these generic features arise. 


4.2 Linear Stability of Fixed Points 


Qualitative changes are associated with fixed points of the surface 
of section. As the drive is turned on chaotic zones appear at fixed 
points on separatrices of the undriven system, and we observe the 
appearance of new fixed points associated with resonance islands. 
Here we investigate the behavior of systems near fixed points. We 
can distinguish two types of fixed points of a dynamical system. 
There are fixed points of the differential equations governing the 
evolution. These are equilibrium points of the system. There 
are also fixed points on a surface of section. These are either 
equilibrium points or periodic orbits of the system. 


4.2.1 Equilibria of Differential Equations 


Consider first the case of a fixed point of a system of differential 
equations. If a system is initially at an equilibrium point, the 
system remains there. What can we say about the evolution of the 
system for points near such an equilibrium point? This is actually 
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a very difficult question, which is not completely answered. We 
can however understand quite a lot about the motion of systems 
near equilibrium. The first step is to investigate the evolution 
of a linear approximation to the differential equations near the 
equilibrium. This part is easy, and is the subject of linear stability 
analysis. Later, we will address what the linear analysis implies 
for the actual problem. 
Consider a system of ordinary differential equations 


Dz(t) = F(t,z(t)), (4.1) 
with components 
D(t) = F'(t, z(t), seg (t) (4.2) 


where n is the dimension of the state space. An equilibrium point 
of this system of equations is a point Ze for which the state deriva- 
tive is zero: 


0 = F(t, ze). (4.3) 


That this is zero at all moments for the equilibrium solution im- 
plies F (t, ze) = 0. 

Next consider a state path z’ which passes near the equilibrium 
point. The path displacement ¢ is defined so that at time t 


z'(t) = ze + C(t). (4.4) 
We have 
DE(t) = Dz (t) = F(t, ze + C(t). (4.5) 


If ¢ is small we can write the right-hand side as a Taylor series in 


Ç: 
DE(t) = F(t, ze) + OF (t, ze) (t) +>, (4.6) 
but the first term is zero because ze is an equilibrium point, so 


DC(t) = ILF (t, ze)C(t) to. (4.7) 


4.2.1 Equilibria of Differential Equations 273 


If ¢ is small the evolution is approximated by the linear terms. 
Linear stability analysis investigates the evolution of the approx- 
imate equation 


DC(t) = AF (Et, ze)C(t). (4.8) 


These are the variational equations (3.140) with the equilibrium 
solution substituted for the reference trajectory. The relationship 
of the solutions of this linearized system to the full system is a 
difficult mathematical problem which is not fully resolved. 

If we restrict attention to autonomous systems (oF = 0) then 
the variational equations at an equilibrium are a linear system of 
ordinary differential equations with constant coefficients.? Such 
systems can be solved analytically. To simplify the notation, let 
M = ð, F(t, Ze), so 


DE(t) = MCC). (4.9) 
We seek a solution of the form 
C(t) = ae™, (4.10) 


where a is a structured constant with the same number of com- 
ponents as ¢. Substituting, we find 


ae = Mae. (4.11) 
The exponential factor is not zero, so we find 
Ma= da, (4.12) 


which is an equation for the eigenvalue \ and (normalized) eigen- 
vector a. In general, there are n eigenvalues and n eigenvectors, so 
we must add a subscript to both a and A indicating the particular 
solution. The general solution is an arbitrary linear combination 
of these individual solutions. The eigenvalues are solutions of the 
characteristic equation 


0 = det(M — AI) (4.13) 


? Actually, all we need is 0001 F(t, ze) = 0. 
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where M is the matrix representation of M and I is the identity 
matrix of the same dimension. The elements of M are real, so we 
know that the eigenvalues A are either real or come in complex- 
conjugate pairs. We assume the eigenvalues are all distinct.’ 

If the eigenvalue is real then the solution is exponential, as 
assumed. If the eigenvalue A > 0 then the solution expands expo- 
nentially along the direction a; if A < 0 then the solution contracts 
exponentially along the direction a. 

If the eigenvalue is complex we can form real solutions by com- 
bining the two solutions for the complex-conjugate pair of eigen- 
values. Let A = a + ib, with real a and b, be one such complex 
eigenvalue. Let œ = u + iv, where u and v are real, be the eigen- 
vector corresponding to it. So there is a complex solution of the 
form 


Celt) = (u $ ivje tN 

(u + iv)e“ (cos bt + isin bt) 

e%(ucosbt — vsin bt) 

+ ie™ (usin bt + v cos bt). (4.14) 


The complex conjugate of this solution is also a solution, because 
the ordinary differential equation is linear with real linear coef- 
ficients. This complex-conjugate solution is associated with the 
eigenvalue which is the complex conjugate of the original complex 
eigenvalue. So the real and imaginary parts of Çe are real solutions: 


Calt) = e™ (u cos bt — v sin bt) 
G&(t) = e” (usin bt + v cos bt) (4.15) 


These two solutions reside in the plane containing the vectors u 
and v. If a is positive both solutions spiral outwards exponentially, 
and if a is negative they both spiral inwards. If a is zero, both 
solutions trace the same ellipse, but with different phases. 
Again, the general solution is an arbitrary linear combination 
of the particular real solutions corresponding to the various eigen- 


3If the eigenvalues are not unique then the form of the solution is modified. 
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values. So if we denote the k' real eigensolution ¢),(t), then the 
general solution is 


C(t) = X Ance(t), (4.16) 
k 


where A; may be determined by the initial conditions (the state 
at a given time). 


Exercise 4.1: Pendulum 


Carry out the details of finding the eigensolutions for the two equilibria 
of the pendulum (0 = 0 and 6 = z, both with pọ = 0). How is the small 
amplitude oscillation frequency related to the eigenvalues? How are the 
eigendirections related to the contours of the Hamiltonian? 


4.2.2 Fixed Points of Maps 


Fixed points on a surface of section correspond either to equilib- 
rium points of the system or to a periodic motion of the system. 
Linear stability analysis of fixed points is similar to the linear 
stability analysis for equilibrium points. 

Let T be a map of the state space onto itself, as might be gener- 
ated by a surface of section. A trajectory sequence is generated by 
successive iteration of the map T. Let x(n) be the nt! point of the 
sequence. The map carries one point of the trajectory sequence 
to the next: x(n + 1) = T(a(n)). We can represent successive it- 
erations of the map by a superscript: so T’ indicates T composed 
i times. For example, T?(2) = T(T(x)). Thus a(n) = T” (x(0)).4 

A fixed point zo of the map T satisfies 


zo = Tao) (4.17) 


Let x be some trajectory initially near xo, and € be the deviation 
from xo: x(n) = zo + €(n). The trajectory satisfies 


zo + &(n + 1) = T(ao + €(n)). (4.18) 
Expanding the right hand side as a Taylor series we obtain 


zo + &(n +1) = T(xo) + DT (zo) (n) +--+, (4.19) 


“The map T is being used as an operator: multiplication is interpreted as 
composition. 
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but zo = T(x) so 
E(n +1) = DT(ao)€(n) +---. (4.20) 


Linear stability analysis considers the evolution of the system 
truncated to the linear terms 


(n +1) = DT (ao)€(n). (4.21) 


This is a system of linear difference equations, with constant co- 
efficients DT (xo). 
We assume there are solutions of the form 


(n) = pra, (4.22) 


where p is some (complex) number. Substituting this solution in 
the linearized evolution equation we find 


pa = DT (xo)a, (4.23) 
(DT (x0) — pI)a = 0, (4.24) 


where I is the identity function. We see that p is an eigenvalue of 
the linear transformation DT (xo), and a is the associated (nor- 
malized) eigenvector. Let M = DT(2z9), and M be its matrix 
representation. The eigenvalues are determined by 


det(M — pI) = 0. (4.25) 


The elements of M are real, so the eigenvalues p are either real or 
come in complex-conjugate pairs.” 

For the real eigenvalues the solutions are just exponential ex- 
pansion or contraction along the associated eigenvector a: 


Eln) = par. (4.26) 


The solution is expanding if ||p|| > 1 and contracting if ||p|| < 1. 
If the eigenvalues are complex, then the solution is complex, but 
the complex solutions corresponding to the complex conjugate pair 
of eigenvalues can be combined to form two real solutions, as was 


5 : ae 
°We assume the eigenvalues are distinct for now. 
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done for the equilibrium solutions. Let p = exp(A + iB) with real 
A and B, and € = u + iv. A calculation similar to that for the 
equilibrium case show that there are two real solutions 


€q(n) = e^" (ucos Bn — vsin Bn) 
&,(n) = ef” (usin Bn + v cos Bn). (4.27) 


We see that if A > 0 then the solution exponentially expands, 
and if A < 0 the solution exponentially contracts. Exponential 
expansion A > 0 corresponds to ||p|| > 1; exponential contraction 
corresponds to ||p|| < 1. If A = 0 then the two real solutions and 
any linear combination of them traces an ellipse. 

The general solution is an arbitrary linear combination of each 
of the eigensolutions. Let £ be the kt? real eigensolution. The 
general solution is 


Eln) = X Arêr(n), (4.28) 
k 


where A; may be determined by the initial conditions. 


Exercise 4.2: Elliptical oscillation 


Show that the arbitrary linear combination of € and & traces an ellipse 
for A =0. 


Exercise 4.3: Standard map 


The standard map (see section 3.9) has fixed points at I = 0 for 0 = 0 
and 0 = v. Find the full eigensolutions for these two fixed points. For 
what ranges of the parameter K are the fixed points linearly stable or 
unstable. 


4.2.3 Relations Among Exponents 


For maps that are generated by stroboscopic sampling of the evo- 
lution of a system of autonomous differential equations, equilib- 
rium points are fixed points of the map. The eigensolutions of the 
equilibrium of the flow and the eigensolutions of the map at the 
fixed point are then related. Let 7 be the sampling period. Then 
pi = em, 

The Lyapunov exponent is a measure of the rate of exponential 
divergence of nearby trajectories from a reference trajectory. If 
the reference trajectory is an equilibrium of a flow then the Lya- 
punov exponents are the real parts of the linearized characteristic 
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exponents 4;. If the reference trajectory is fixed point of a map 
generated by a flow (either a periodic orbit or an equilibrium) 
then the Lyapunov exponents are real parts of the logarithm of 
the characteristic multipliers, divided by the period of the map. 
So if the characteristic multiplier is p = e4+’? and the period 
of the map is 7 then the Lyapunov exponent is A/r. A positive 
Lyapunov exponent of a fixed point indicates linear instability of 
the fixed point. 

The Lyapunov exponent has less information than the charac- 
teristic multipliers or exponents because the imaginary part is lost. 
However, the Lyapunov exponent is more generally applicable in 
that it is well defined even for reference trajectories that are not 
periodic. 

In the linear analysis of the fixed point, each characteristic ex- 
ponent corresponds to a subspace of possible linear solutions. For 
instance, for a real characteristic multiplier there is a correspond- 
ing eigendirection, and for any initial displacement along this di- 
rection successive iterates are also along this direction. Complex- 
conjugate pairs of multipliers correspond to a plane of solutions. 
For a displacement initially on this plane, successive iterates are 
also on this plane. 

It turns out that something like this is also the case for the lin- 
earized solutions near a reference trajectory that is not at a fixed 
point. For each non-zero Lyapunov exponent there is a twisting 
subspace so that for an initial displacement in this subspace suc- 
cessive iterates also belong to the subspace. At different points 
along the reference trajectory the unit displacement vector that 
characterizes the direction of this subspace is different. 


Hamiltonian specialization 
For Hamiltonian systems there are additional constraints among 
the eigenvalues. 

Consider first the case of two-dimensional surfaces of section. 
We have seen that Hamiltonian surfaces of section are area pre- 
serving. As we saw in the proof of Liouville’s theorem, area 
preservation implies that the determinant of the derivative of the 
transformation is 1. At a fixed point zo the linearized map is 
E(n+ 1) = DT(ao)E(n). So M = DT(xo) has unit determinant. 
Now the determinant is the product of the eigenvalues, so for a 
fixed point on a Hamiltonian surface of section the two eigenval- 
ues must be inverses of each other. We also have the constraint 
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Figure 4.4 The eigenvalues for fixed points of a two-dimensional 
Hamiltonian map. The eigenvalues are either complex-conjugate pairs 
that lie on the unit circle or they are real. For each eigenvalue the inverse 
is also an eigenvalue. 


that if an eigenvalue is complex then the complex conjugate of 
the eigenvalue is also an eigenvalue. These two conditions im- 
ply that the eigenvalues must either be real and inverses, or be 
complex-conjugate pairs on the unit circle (see figure 4.4). 

Fixed points for which the characteristic multipliers all lie on 
the unit circle are called elliptic fixed points. The solutions of 
the linearized variational equations trace ellipses around the fixed 
point. Elliptic fixed points are linearly stable. 

Fixed points with positive real characteristic multipliers are 
called hyperbolic fixed points. For two-dimensional maps, there 
is an exponentially expanding subspace and an exponentially con- 
tracting subspace. The general solution is a linear combination 
of these. Fixed points for which the characteristic multipliers are 
negative are called hyperbolic with reflection. 

The edge case of two degenerate characteristic multipliers is 
called parabolic. For two degenerate eigenvalues the general solu- 
tion grows linearly. This happens at points of bifurcation where 
elliptic points become hyperbolic points or vice versa. 
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For two-dimensional Hamiltonian maps these are the only pos- 
sibilities. For higher dimensional Hamiltonian maps, we can get 
combinations of these: some characteristic multipliers can be real 
and others complex-conjugate pairs. We might imagine that in 
addition there would be many other types of fixed points that 
occur in higher dimension. In fact, there is only one additional 
type, shown in figure 4.5. For Hamiltonian systems of arbitrary 
dimensions it is still the case that for each eigenvalue the complex 
conjugate and the inverse are also eigenvalues. We can prove this 
starting from a result that we will prove in chapter 5. Consider 
the map of the phase space onto itself that is generated by time 
evolution of a Hamiltonian system. Let z = (q,p), then the map 
To satisfies z(t + 3) = Tg(z(t)) for solutions z of Hamilton’s equa- 
tions. We will show in chapter 5 that the derivative of the map 
Tg is symplectic, whether or not the starting point is at a fixed 
point. A 2n x 2n matrix M is symplectic if it satisfies 


MJM! = J, (4.29) 


where J is the 2n-dimensional symplectic unit: 


J 2. [ Onxn Inxn l (4.30) 


—lnxn Onxn 
with the n x n unit matrix 1,., and the n x n zero matrix On yn. 
Using the symplectic property we can show that in general for 
each eigenvalue its inverse is also an eigenvalue. Assume p is 
an eigenvalue, so p satisfies det(M — pI) = 0. This equation 
is unchanged if M is replaced by its transpose, so p is also an 
eigenvalue of MT: 


M'a’ = pa’. (4.31) 


From this we can see that 
1 
a = (M). (4.32) 
p 


Now, from the symplectic property we have 


MJ = J(M?)"!. (4.33) 
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Figure 4.5 If there is more than one degree of freedom the eigenvalues 
for fixed points of a Hamiltonian map may lie in a quartet, with two 
complex-conjugate pairs. The magnitudes of the pairs must be inverses. 
This enforces the constraint that the expansion produced by the roots 
with magnitude greater than one is counterbalanced by the contraction 
produced by the roots with magnitude smaller than one. 


So 
1 

MJa’ = J(M}) ta = —Ja’, (4.34) 
p 


and we can conclude that 1/p is an eigenvalue of M with the 
eigenvector Ja’. From the fact that for every eigenvalue its in- 
verse is also an eigenvalue we deduce that the determinant of the 
transformation M, which is the product of the eigenvalues, is one. 

The constraints that the eigenvalues must be associated with 
inverses and complex conjugates yields exactly one new pattern of 
eigenvalues in higher dimensions. Figure 4.5 shows the only new 
pattern that is possible. 

We have seen that the Lyapunov exponents for fixed points 
are related to the characteristic multipliers for the fixed points, 
so the Hamiltonian constraints on the multipliers correspond to 
Hamiltonian constraints for Lyapunov exponents at fixed points. 
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For each characteristic multiplier, the inverse is also a character- 
istic multiplier. This means that at fixed points, for each positive 
Lyapunov exponent there is a corresponding negative Lyapunov 
exponent with the same magnitude. It turns out that this is also 
true if the reference trajectory is not at a fixed point. For Hamil- 
tonian systems, for each positive Lyapunov exponent there is a 
corresponding negative exponent of equal magnitude. 


Exercise 4.4: Quartet 
Describe (perhaps by drawing cross sections) the orbits that are possible 
with quartets. 


Linear and nonlinear stability 

A fixed point that is linearly unstable indicates that the full sys- 
tem is unstable at that point. What this means is that trajectories 
starting near the fixed point diverge from the fixed point. On the 
other hand, linear stability of a fixed point does not generally 
guarantee that the full system is stable at that point. For a two- 
degree of freedom Hamiltonian system the Kolmogorov-Arnold- 
Moser theorem proves under certain conditions that linear stabil- 
ity implies nonlinear stability. In higher dimensions though it is 
not known whether linear stability implies nonlinear stability. 


4.3 Homoclinic Tangle 


For the driven pendulum we observe that as the amplitude of the 
drive is increased the separatrix of the undriven pendulum is where 
the most prominent chaotic zone appears. Here we examine the 
motion in the vicinity of the separatrix of the undriven pendulum 
in great detail. What emerges is a remarkably complicated pic- 
ture, first discovered by Henri Poincaré. Indeed, Poincaré stated 
that the picture that had emerged was so complicated that he was 
not even going to attempt to draw it. We will review the argu- 
ment leading to the picture, and compute enough of it to convince 
ourselves of its reality. 

The separatrix of the undriven pendulum is made up of two 
trajectories that are asymptotic to the unstable equilibrium. In 
the driven pendulum with zero drive, there are an infinite number 
of distinct orbits that lie on the separatrix, which are distinguished 
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Figure 4.6 The neighborhood of the unstable fixed point of the pen- 
dulum shows the stable and unstable manifolds of the nonlinear pen- 
dulum and of the linearized variational system around the fixed point. 
The axes are centered at the fixed point (+7,0). The linear stable and 
unstable manifolds are labeled by V5 and V"; the nonlinear stable and 
unstable manifolds are labeled by W5 and W". 


by the phase of the drive. These orbits are asymptotic to the 
unstable fixed point both forward and backward in time. 

Notice that close to the unstable fixed point the sets of points 
that are asymptotic to the unstable equilibrium must be tangent 
to the linear variational eigenvectors at the fixed point. (See fig- 
ure 4.6.) In a sense, the sets of orbits that are asymptotic to the 
fixed point are extensions to the non-linear problem of the sets 
of orbits that are asymptotic to the fixed point in the linearized 
problem. 

In general, the set of points that are asymptotic to an unstable 
fixed point forward in time is called the stable manifold of the 
fixed point. The set of points that are asymptotic to an unstable 
fixed point backward in time is called the unstable manifold. For 
the driven pendulum with zero amplitude drive all points on the 
separatrix are asymptotic both forward and backward in time to 
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the unstable fixed point. So in this case the stable and unstable 
manifolds coincide. 

If the drive amplitude is non-zero then there are still one- 
dimensional sets of points that are asymptotic to the unstable 
fixed point forward and backward in time: there are still stable 
and unstable manifolds. Why? The behavior near the fixed point 
is described by the linearized variational system. For the linear 
variational system, points in the space spanned by the unstable 
eigenvector, when mapped backwards in time, are asymptotic to 
the fixed point. Points slightly off this curve may initially ap- 
proach the unstable equilibrium, but eventually will fall away to 
one side or the other. For the driven system with small drive, 
there must still be a curve which separates the points that fall 
away to one side from the points that fall away to the other side. 
Points on the dividing curve must be asymptotic to the unstable 
equilibrium. The dividing set cannot have positive area because 
the map is area preserving. 

For the zero-amplitude drive case the stable and unstable man- 
ifolds are contours of the conserved Hamiltonian. For non-zero 
amplitude the Hamiltonian is no longer conserved. For non-zero 
drive the stable manifolds and unstable manifolds no longer coin- 
cide. This is generally true for non-integrable systems: stable and 
unstable manifolds do not coincide. 

If the stable and unstable manifolds no longer coincide where 
do they go? In general, the stable and unstable manifolds must 
cross one another. The only other possibilities are that they run 
off to infinity or spiral around. Area preservation can be used to 
exclude the spiraling case. We will see that in general there are 
barriers to running away. So the only possibility is that the stable 
and unstable manifolds cross. This is illustrated in figure 4.7. The 
point of crossing of a stable and unstable manifold is called a ho- 
moclinic intersection if the stable and unstable manifolds belong 
to the same unstable fixed point. It is called a heteroclinic in- 
tersection if the stable and unstable manifolds belong to different 
fixed points. 

If the stable and unstable manifolds cross once then there are an 
infinite number of other crossings. The intersection point belongs 
to both the stable and unstable manifolds. That it is on the 
unstable manifold means that all images forward and backward in 
time also belong to the unstable manifold, and likewise for points 
on the stable manifold. Thus all images of the intersection belong 
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Figure 4.7 For non-zero drive the stable and unstable manifolds no 
longer coincide and in general cross. The dashed circle indicates the 
central intersection. Forward and backward images of this intersection 
are themselves intersections. Because the orbits are asymptotic to the 
fixed point there are an infinity of such intersections. 


to both the stable and unstable manifolds. So these images must 
be additional crossings of the two manifolds. 

We can deduce that there are still more intersections of the 
stable and unstable manifolds. The maps we are considering not 
only preserve area, but they preserve orientation. In the proof of 
Liouville’s theorem we showed that the determinant of the trans- 
formation is one, not just magnitude one. If we consider little 
segments of the stable and unstable manifolds near the intersec- 
tion point then these segments must map near the image of the 
intersection point. That the map preserves orientation implies 
that the manifolds are crossing one another in the same sense as 
at the previous intersection. Therefore there must have been at 
least one more crossing of the stable and unstable manifolds in 
between these two. This is illustrated in figure 4.8. Of course, all 
forward and backward images of these intermediate intersections 
are also intersections. 

As the picture gets more complicated keep in mind that the 
stable manifold cannot cross itself and the unstable manifold can- 
not cross itself. Suppose one did, say by making a little loop. The 
image of this loop under the map must also be a loop. So if there 
was a loop there would have to be an infinite number of loops. 
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Figure 4.8 Orientation preservation implies that between an inter- 
section of the stable and unstable manifolds and the image of this in- 
tersection there is another intersection. Thus there are two alternating 
families of intersections. The central intersection and its pre-images and 
post-images are labeled A;. Another family is labelled B;. 


That would be ok, but what happens as the loop gets close to 
the fixed point? There would still have to be loops, but then the 
stable and unstable manifolds would not have the right behavior: 
the stable and unstable manifolds of the linearized map do not 
have loops. Therefore, the stable and unstable manifolds cannot 
cross themselves.® 

We are not done yet! The lobes that are defined by successive 
crossings of the stable and unstable manifolds enclose a certain 
area. The map is area preserving so all images of these lobes must 
have the same area. So there are an infinite number of images of 
these lobes, all with the same area. Furthermore, the boundaries 
of these images cannot cross. As the lobes approach the fixed 
point we get an infinite number of lobes with a base with an 
exponentially shrinking length. In order to pack these together 
on the plane, without the boundaries crossing each other, the 
lobes must stretch out to preserve area. We see that the length of 
the lobe must grow roughly exponentially (It may not be uniform 
in width so it need not be exactly exponential.) This exponential 
lengthening of the lobes no doubt bears some responsibility for the 
exponential divergence of nearby trajectories of chaotic orbits, but 


®Sometimes it is argued that the stable and unstable manifolds cannot cross 
themselves on the basis of the uniqueness of solutions of differential equations. 
This is an incorrect argument. The stable and unstable manifolds are not 
themselves solutions of a differential equation, they are sets of points whose 
solutions are asymptotic to the unstable fixed points. 
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does not prove it. It does however suggest a connection between 
the fact that chaotic orbits appear to occupy an area on the section 
and the fact that nearby chaotic orbits diverge exponentially. 

Actually, the situation is even more complicated. As the lobes 
stretch, they form tendrils that wrap around the separatrix region. 
The tendrils of the unstable manifold can cross the tendrils of the 
stable manifold. Each point of crossing is a new homoclinic inter- 
section, and so each pre and post image of this point belongs to 
both the stable and unstable manifolds, indicating another cross- 
ing of these curves. We could go on and on. No wonder Poincaré 
refused to draw this mess. 


Exercise 4.5: Homoclinic paradox 


How do we fit an infinite number of copies of a finite area in a finite re- 
gion, without allowing the stable and unstable manifolds to cross them- 
selves? Resolve this apparent paradox. 


4.3.1 Computation of Stable and Unstable Manifolds 


The homoclinic tangle is not just a bad dream. We can actually 
compute it. 

Very close to an unstable fixed point the stable and unstable 
manifolds become indistinguishable from the rays along the eigen- 
vectors of the linearized system. So one way to compute the un- 
stable manifold is to take a line of initial conditions close to the 
fixed point along the unstable manifold of the linearized system 
and evolve them forward in time. Similarly, the stable manifold 
can be constructed by taking a line of initial conditions along the 
stable manifold of the linearized system and evolving them back- 
ward in time. 

We can do better than this by choosing some parameter (like 
arclength) along the manifold and for each parameter decide how 
many iterations of the map would be required to take the point 
back to within some small region of the fixed point. We then 
choose an initial condition along the linearized eigenvectors and 
iterate the point back with the map. This idea is implemented in 
the following program: 
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(define ((unstable-manifold T xe ye dx dy A eps) param) 
(let ((n (floor->exact (/ (log (/ param eps)) (log A))))) 
(Citerated-map T n) (+ xe (* dx (/ param (expt A n)))) 
(+ ye (* dy (/ param (expt A n)))) 
cons 
list))) 


where T is the map, xe and ye are the coordinates of the fixed 
point, dx and dy are components of the linearized eigenvector, A 
is the characteristic multiplier, eps is a scale within which the 
linearized map is a good enough approximation to T, and param is 
a continuous parameter along the manifold. The program assumes 
that there is a basic exponential divergence along the manifold— 
that is why we take the logarithm of param to get initial conditions 
in the linear regime. This assumption is not exactly true, but good 
enough for now. 

The curve is generated by a call to plot-parametric-fill, 
which recursively subdivides intervals of the parameter until there 
are enough points to get a smooth curve. 


(define (plot-parametric-fill win f a b near?) 
(let loop ((a a) (xa (f a)) (b b) (xb (f b))) 
(if (not (close-enuf? a b (* 10 *machine-epsilon*) )) 
(let ((m (/ (+ a b) 2))) 
(let ((xm (f m))) 

(plot-point win (car xm) (cdr xm)) 

(if (not (near? xa xm)) 
(loop a xa m xm)) 

(if (not (near? xb xm)) 
(loop m xm b xb))))))) 


The near? argument is a test for whether two points are within 
a given distance of each other in the graph. Because some co- 
ordinates are angle variables, this may involve a principal value 
comparison. For example, for the driven pendulum section, the 
horizontal axis is an angle but the vertical axis is not, so the pic- 
ture is on a cylinder: 


(define (cylinder-near? eps) 
(let ((eps2 (square eps))) 
(lambda (x y) 
(< (+ (square ((principal-value pi) 
(- (car x) (car y)))) 
(square (- (cdr x) (cdr y)))) 
eps2)))) 
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Figure 4.9 shows a computation of the homoclinic tangle for the 
driven pendulum. The parameters are m = 1kg, g = 9.8kgms7!, 
l= Im, w = 4.2,/g/l, and amplitude A = 0.05m. For reference, 
figure 4.9 shows a surface of section for these parameters on the 


same scale. 


Exercise 4.6: Computing homoclinic tangles 
a. Compute stable and unstable manifolds for the standard map. 
b. Identify the features on the homoclinic tangle that entered the argu- 


ment about its existence, such as the central crossing of the stable and 
unstable manifolds, etc. 


c. Investigate the errors in the process. Are the computed manifolds 
really correct or a figment of wishful thinking? One could imagine that 
the errors are exponential and the computed manifolds have nothing to 
do with the actual manifolds. 


d. How much actual space is taken up by the homoclinic tangle? Con- 
sider a value of the coupling constant K = 0.8. Does the homoclinic 
tangle actually fill out the apparent chaotic zone? 


4.4 Integrable Systems 


Islands appear near commensurabilities, and commensurabilities 
are present even in integrable systems. In integrable systems an 
infinite number of periodic orbits are associated with each com- 
mensurability, but upon perturbation only a finite number of pe- 
riodic orbits survive. How does this happen? First we have to 
learn more about integrable systems. 

If an n degree of freedom system has n independent conserved 
quantities then the solution of the problem can be reduced to 
quadratures. Such a system is called integrable. Typically, the 
phase space of integrable systems is divided into regions of qual- 
itatively different behavior. For example, the motion of a pendu- 
lum is reducible to quadratures, and has three distinct types of 
solutions: the oscillating solutions and the clockwise and coun- 
terclockwise circulating solutions. The different regions of the 
pendulum phase space are separated by the trajectories that are 
asymptotic to the unstable equilibrium. It turns out that for any 
system that is reducible to quadratures a set of phase space coor- 
dinates can be chosen for each region of the phase space so that the 
Hamiltonian describing the motion in that region depends only on 
the momenta. Furthermore if the phase space is bounded then the 
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Figure 4.9 The computed homoclinic tangle for the driven pendulum 
exhibits the features described in the text. Notice how the excursions 
of the stable and unstable manifolds become longer and thinner as they 
approach the unstable fixed point. A surface of section with the same 
parameters is also shown. 
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generalized coordinates can be chosen to be angles (that are 27 
periodic). The configuration space described by n angles is an n- 
torus. The momenta conjugate to these angles are called actions. 
Such phase space coordinates are called action-angle coordinates. 
We will see how to reformulate systems in this way later. Here we 
explore the consequences of such a formulation; this formulation is 
especially useful for exploring what happens as additional effects 
are added to integrable problems. 


Orbit types in integrable systems 

Suppose we have a time-independent n degree of freedom system 
that is reducible to quadratures. For each region of phase space 
there is a local formulation of the system so that the evolution 
of the system is described by a time-independent Hamiltonian 
that depends only on the momenta. Suppose further that the 
coordinates are all angles. Let 0 be the tuple of angles, and J be 
the tuple of conjugate momenta. The Hamiltonian is 


H(t,0, J) = Ftd). (4.35) 
Hamilton’s equations are simply 


DJ(t) = -3 H(t, 6(t), J(t)) = 0 
DO(t) = H(t, 6(t), J(t)) = w(J(t)), (4.36) 


where w(J) = Df(J) is a tuple of frequencies with a component 
for each degree of freedom. The momenta are all constant because 
the Hamiltonian does not depend on any of the coordinates. The 
motion of the coordinate angles is uniform; the rate of change 
of the angles are the frequencies w, which depend only on the 
constant momenta. Given initial values 0(tọ) and J(to) at time 
to, the solutions are simple: 


I(t) = J(to) 
A(t) = w(J (to))(t — to) + (to). (4.37) 


Though the solutions are simple, there are a number of distinct 
orbit types: equilibrium solutions, periodic orbits, and quasiperi- 
odic orbits, depending on the frequency ratios. 

If w(J) is zero for some J then 0 and J are both constant, for 
any 0. The system is at an equilibrium point. 
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Figure 4.10 The solid and dotted lines show two periodic trajectories 
on the configuration coordinate plane. For commensurate frequencies 
the configuration motion is periodic, independent of the initial angles. 
In this illustration the frequencies satisfy 3w°(J(to)) = 2w'(J(to)). The 
orbit closes after 3 cycles of 6° and 2 cycles of 0t, for any initial 9° and 
0t. 


A solution is periodic if all the coordinates (and momenta) 
of the system return to the initial coordinates (and momenta) 
at some later time. Each coordinate 0f with nonzero frequency 
wt (J(to)) is periodic with a period T; = 27/w*(J(to)). The period 
of the system must therefore be an integer multiple k; of each 
of the individual coordinate periods T;. If the system is periodic 
with some set of integer multiples, then it is also periodic with 
any common factors divided out. Thus the period of the system 
is T = (k;/d)T; where d is the greatest common divisor of the 
integers ki. 

For a system with two degrees of freedom a solution is periodic if 
there exist relatively prime integers k and j such that kw°(J(to)) = 
jw'(J(to)). The period of the system is T = 27j/w°(J(to)) = 
2rk/w'(J(to)); the frequency is w°(J(to))/j = wi(J(to))/k. A 
periodic motion on the 2-torus is illustrated in figure 4.10. 
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If the frequencies w*(J(to)) satisfy an integer-coefficient relation 
>>, niwt (J (to)) = 0 among its frequencies we say that the frequen- 
cies satisfy a commensurability. If there is no commensurability 
for any non-zero integer coefficients we say that the frequencies 
are linearly independent (with respect to the integers) and the so- 
lution is quasiperiodic. One can prove that for n incommensurate 
frequencies all solutions come arbitrarily close to every point in 
the configuration space.’ 

For a system with two degrees of freedom the solutions in a 
region described by a particular set of action-angle variables are 
either equilibrium solutions, periodic solutions, or quasiperiodic 
solutions.® For systems with more than two degrees of degrees 
of freedom there are trajectories that are neither periodic nor 
quasiperiodic with n frequencies. These are quasiperiodic with 
fewer frequencies and dense over a corresponding lower dimen- 
sional torus. 


Surfaces of section for integrable systems 
As we have seen, in action-angle coordinates the angles move 
with constant angular frequencies, and the momenta are constant. 
Thus surfaces of section in action-angle coordinates are particu- 
larly simple. We can make surfaces of section for time-independent 
two degree of freedom systems or one degree of freedom systems 
with periodic drive. In the latter case, one of the angles in the 
action-angle system is the phase of the drive. We make surfaces 
of section by accumulating points in one pair of canonical coordi- 
nates as the other coordinate goes through some particular value, 
such as zero. If we plot the section points with the angle coordi- 
nate on the abscissa and the conjugate momentum on the ordinate 
then the section points for all trajectories lie on horizontal lines, 
as illustrated in figure 4.11. 

For definiteness, let the plane of the surface of section be the 
(0°, Jo) plane, and the section condition be 61 = 0. The other 


“Motion with n incommensurate frequencies is dense on the n-torus. Further- 
more, such motion is ergodic on the n-torus. This means that time averages of 
time independent phase space functions computed along trajectories are equal 
to the phase space average of the same function over the torus. 


SFor time-independent systems with two degrees of freedom the boundary 
between regions described by different action-angle coordinates has asymptotic 
solutions and unstable periodic orbits or equilibrium points. The solutions on 
the boundary are not described by the action-angle Hamiltonian. 
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Figure 4.11 On surfaces of section for systems in action-angle coor- 
dinates all trajectories generate points on horizontal lines. Trajectories 
with frequencies that are commensurate with the sampling frequency 
produce a finite number of points. Trajectories with frequencies that 
are incommensurate with the sampling frequency fill out a horizontal 
line densely. 


momentum J; is chosen so that all the trajectories have the same 
energy. The momenta are all constant, so for a given trajectory 
all points that are generated are constrained to a line of constant 
Jo. 

The time between section points is the period of 64: At = 
27/w'(J(to)) because a section point is generated for every cy- 
cle of 0t. The angle between successive points on the section 
is w°(J(to))At = w(J(to))27/w!(J(to)) = 2rv(J(to)), where 
v(J) = w°(J)/w'(J) is called the rotation number of the tra- 
jectory. Let 6(i) and J(i) be the ith point (i is an integer) in a 
sequence of points on the surface of section generated by a solution 
trajectory: 


6(i) = 0°(iAt + to) 
J(i) = Jo(iAt + to), (4.38) 
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where the system is assumed to be on the section at t = to. Along 
a trajectory, the map from one section point (0(i), J(i)) to the 


next (ôli +1), Î(i + 1)) is of the form:9 


Ce 1) ) a e ) s co + oe . (4.39) 
J(i+1) J(i) J (i) 

As a function of the action on the section, the rotation number is 
»(J(0)) = v(J(0), Ji(to)), where Jı(to) has the value required to 
be on the section, as for example by giving the correct energy. If 
the rotation number function ® is strictly monotonic in the action 
coordinate on the section then the map is called a twist map.'° 

On a surface of section the different types of orbits generate 
different patterns. If the orbit is an equilibrium solution then the 
initial point on the surface of section is a fixed point. The system 
just stays there. 

If the two frequencies are commensurate then the trajectory is 
periodic and there are only a finite number of points generated on 
the surface of section. Both of the periodic solutions illustrated in 
figure 4.10 generate two points on the surface of section defined by 
6! = 0. If the frequencies are commensurate they satisfy a relation 
of the form kw°(J(to)) = jw'(J(to)), where J(to) = (J(0), Ji(to)) 
is the initial and constant value of the momentum tuple. The 
motion is periodic with frequency w°(J(to))/j = wt(J(to))/k, so 
the period is 27j/w°(J(to)) = 27k/wt(J(to)). Thus this periodic 
orbit generates k points on this surface of section. For trajectories 
with commensurate frequencies the rotation number is rational: 
i(J(0)) = v(J(0), Ji(to)) = j/k. The coordinate 6! makes k 
cycles while the coordinate 6° makes j cycles (figure 4.10 shows a 
system with a rotation number of 2/3.). The frequencies depend 
on the momenta but not on the coordinates, so the motion is 
periodic with the same period and rotation number for all initial 
angles given these momenta. Thus there is a continuous family of 
periodic orbits with different initial angles. 

If the two frequencies are incommensurate, then the 2-torus 
is filled densely. Thus the line on which the section points are 


°The coordinate 6(i) is an angle. It can be brought to a standard interval such 
as 0 to 27r. 


10 Actually, to be a twist map we require |Dv(J)| > K > 0 over some interval 
of J. 
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generated is filled densely. Again, this is the case for any initial 
coordinates, because the frequencies depend only on the momenta. 
There are infinitely many such orbits which are distinct for a given 
set of frequencies.!! 


4.5 Poincaré-Birkhoff Theorem 


How does this picture change if we add additional effects? 

One peculiar feature of the orbits in integrable systems is that 
there are continuous families of periodic orbits. The initial angles 
do not matter, the frequencies depend only the actions. Contrast 
this with our earlier experience with surfaces of section in which 
periodic points are isolated, and associated with island chains. 
Here we investigate periodic orbits of near-integrable systems, and 
find that typically for each rational rotation number there are a 
finite number of periodic points, half of which are linearly stable 
and half linearly unstable. 

Consider an integrable system described in action-angle coordi- 
nates by the Hamiltonian Ho(t, 0, J) = f(J). We add some small 
additional effect described by the term Hy in the Hamiltonian 


H = Ho + cH. (4.40) 


An example of such a system is the periodically driven pendulum 
with small drive amplitude. For zero drive amplitude the driven 
pendulum is integrable, but not for small drive. Unfortunately, 
we do not yet have the tools to develop action-angle coordinates 
for the pendulum. A simpler problem that is already in action- 
angle form is the driven rotor, which is just the driven pendulum 
with gravity turned off. We can implement this by turning our 
driven pendulum on its side, making the plane of the pendulum 
horizontal. A Hamiltonian for the driven rotor is 


D 


H(t, 0, pọ) = Imi 


+ ml Aw? cos wt cos 8, (4.41) 


where A is the amplitude of the drive with frequency w, m is 
the mass of the bob, and l is the length of the rotor. For zero 


"The section points for any particular orbit are countable and dense, but they 
have zero measure on the line. 
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amplitude, the Hamiltonian is already in action-angle form in that 
it depends only on the momentum pg and the coordinate is an 
angle. 

For an integrable system, the map generated on the surface 
of section is of the form (4.39). With the additional of a small 
perturbation to the Hamiltonian, small corrections are added to 
the map 


Oli + 1) 
Cae n(o E 
WORO CORIO 
ee a 


Both the map T and the perturbed map Te are area preserving 
because the maps are generated as surfaces of section for Hamil- 
tonian systems. 

Suppose we are interested in determining whether periodic or- 
bits of a particular rational rotation number ĉ(Ĵ(0)) = j/k exist in 
some interval of the action a < J(0) < 8. If the rotation number 
is strictly monotonic in this interval and orbits with the rotation 
number (J(0)) occur in this interval for the unperturbed map T 
then by a simple construction we can show that periodic orbits 
with this rotation number also exist for Te for sufficiently small e. 

If a point is periodic for rational rotation number 0(J(0)) = 
j/k, with relatively prime j and k, we expect k distinct images 
of the point to appear on the section. So if we consider the kth 
iterate of the map then the point is a fixed point of the map. For 
rational rotation number j/k the map T* has a fixed point for 
every initial angle. 

The rotation number of the map T is strictly monotonic. Sup- 
pose for definiteness we assume the rotation number 0(.J(0)) in- 
creases with J(0). For some J* such that a < J* < B the rotation 
number is j/k, and (6*, J*) is a fixed point of T* for any initial 6*. 
For J* the rotation number of T is zero. The rotation number 
of the map T is monotonically increasing so for J(0) > J* the 
rotation number of T* is positive, and for J(0) < J* the rotation 
number of T* is negative, as long as J(0) is not too far from J*. 
See figure 4.12. 

Now consider the map T*. In general, for small €, points map 
to slightly different points under T, than they do under T, but not 
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Figure 4.12 The map J” has a line of fixed points if the rotation 
number is the rational j/k. Points above this line map to the larger 0°; 
points below this line map to smaller 9° 


99 


Figure 4.13 The map TE is slightly different from T*, but above the 
central region points still map to larger 6° and below the central region 
they map to smaller 0°. By continuity there are points between for 
which 6° does not change. 


too different. So we can expect that there is still some interval 
near J* such that for /(0) in the upper end of the interval T% 
maps points to larger 9°, and for points in the lower end of the 
interval maps to smaller 6°, as we saw for TF. If this is the case 
then for every 0(0) there is a point somewhere in the interval, some 
J*(6(0)), for which 6° does not change, by continuity. These are 
not fixed points because the momentum Jo generally changes. See 
figure 4.13. 

The map is continuous, so we can expect that J+ is a continu- 
ous function of the 0°. As we let 0° vary through 27, either this 
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Figure 4.14 The curve Co of points that map to the same 9° under 
TF is indicated by the solid line. The image of this curve C1 under T} 
is the dotted curve. Area preservation implies these curves cross. 


function is periodic or not. That it must be periodic is a conse- 
quence of area preservation.!? So the set of points that do not 
change 6° under T* form some periodic function of 6°. Call this 
curve Co. See figure 4.14. 

The map T* takes the curve Co to another curve C4, which, like 
Co, is continuous and periodic. The two curves Co and C1 must 
cross each other, as a consequence of area preservation. How do we 
see this? Typically, there is a lower boundary or upper boundary 
in Jo for the evolution. In some situations, we have such a lower 
boundary because Jo cannot be negative. For example, in action- 
angle variables for motion near an elliptic fixed point we will see 
that the action is the area enclosed on the phase plane, which 
cannot be negative. For others, we might use the fact that there 
are invariant curves for large positive or negative Jo. In any case, 
suppose there is such a barrier B. Then, the area of the region 
between the barrier and Co must be equal to the area of the image 
of this region, which is the region between the barrier and the 


1216 J+ were not periodic in 6° then it would have to spiral. Suppose it 
spirals. The region enclosed by two successive turns of the spiral is mapped 
to a region between succesive turns of the spiral further down the spiral. 
The map preserves area, so the spiral cannot asymptote, but must progress 
infinitely down the cylinder. This is impossible because of the twist condition: 
sufficiently far down the cylinder the rotation number is too different to allow 
the angle to be the same under TË. So J+ does not spiral. 
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Figure 4.15 The fixed point on the left is linearly unstable. The one 
on the right is linearly stable. 


curve C1. So if at any point the two curves Co and Cı do not 
coincide, then they must cross to contain the same area. In fact, 
they must cross an even number of times because they are both 
periodic so if they cross once they must cross again to get back to 
the same side they started on. The points at which the curves Co 
and Ci cross are fixed points because the angle does not change 
(that is what it means to be on Co) and the action does not change 
(that is what it means for Co and C4 to be the same at this point). 
So we have deduced that there must be an even number of fixed 
points of T*. For each fixed point of T* there are k images of this 
fixed point under T, on the surface of section. 

We can deduce the stability of these fixed points just from the 
construction. The fixed points come in two types, elliptic and 
hyperbolic. A elliptic fixed point appears where the flow is around 
the fixed point: the map from Cp to C4 can be continued along the 
background flow to make a closed curve. A hyperbolic fixed point 
appears where if we follow the map from Co to Cı we enter the 
background flow in such a way as to leave the fixed point. So just 
from the way the arrows connect we can determine the character 
of the fixed point. See figure 4.15. 

As we develop a Poincaré section, we find that some orbits leave 
traces that circulate around the stable fixed points, resulting in the 
Poincaré-Birkhoff islands. If we look at a particular island we see 
that orbits in the island circulate around the fixed point at a rate 
that is monotonically dependent upon the distance from the fixed 
point. In the vicinity of the fixed point the evolution is governed 
by a twist map. So the entire Poincaré-Birkhoff construction can 
be carried out again. We expect that there will be concentric fam- 
ilies of stable periodic points surrounded by islands and separated 
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by separatrices emanating from unstable periodic points. Around 
each of these stable periodic orbits, the construction is repeated. 
So the Poincaré-Birkhoff construction is recursive, leading to the 
development of an infinite hierarchy of structure. 


4.5.1 Computing the Poincaré-Birkhoff Construction 


There are so many conditions in our construction of the fixed 
points that one might be suspicious. We can make the construc- 
tion more convincing by actually computing the various pieces 
for a specific problem. Consider the periodically driven rotor, 
with Hamiltonian (4.41). We set m = 1kg, | = 1m, A = 0.1m, 
w = 4.2/9.8. 

We call points that map to the same angle “radially mapping 
points.” We find them with a simple bisection: 


(define (radially-mapping-points map Jmin Jmax phi eps) 
(bisect 
(lambda (J) 
((principal-value pi) 
(- phi (map phi J (lambda (phip Jp) phip) list)))) 
Jmin Jmax eps)) 


The procedure map implements some map, which may be an iterate 
of some more primitive map. We give the procedure an angle phi 
to study and a range of actions Jmin to Jmax to search, and a 
tolerance eps for the solution. 

We make a plot of the curves Cp (of initial conditions that map 
radially) and C4 (the image of Co) with an appropriate piece of 
wrapper code. 

In figure 4.16 we show the Poincaré-Birkhoff construction of the 
fixed points for the driven rotor. These particular curves are con- 
structed for the two 1:1 commensurabilities between the rotation 
and the drive. There is one set of fixed points constructed for 
each sense of rotation. The corresponding section is in figure 4.17. 
We see that the section shows the existence of fixed points ex- 
actly where the Poincaré-Birkhoff construction shows the crossing 
of the curves Co and C1. Indeed, we can see that the nature of 
the fixed point is clearly reflected in the relative configuration of 
the Co and C4 curves. 

In figure 4.18 we show the result for a rotation number of 1/3. 
The curves are the radially mapping points for the third iterate 
of the section map (solid) and the images of these points (dot- 
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Figure 4.16 The curves Co (solid) and C (dotted) for the 1:1 com- 
mensurability. 


Figure 4.17 A surface of section displaying the 1:1 commensurability. 
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ted). These curves are distorted by their proximity to the 1:1 
islands shown in figure 4.17. The corresponding section is shown 
in figure 4.19. 


Exercise 4.7: Computing the Poincaré-Birkhoff construction 


Consider the figure 3.27. Find the fixed points for the three major island 
chains, using the Poincaré-Birkhoff construction. 


4.6 Invariant Curves 


We started with an integrable system, where there are invariant 
curves. Do any invariant curves survive if a perturbation is added? 

The Poincaré-Birkhoff construction for twist maps shows that 
invariant curves with rational rotation number typically do not 
survive perturbation. Upon perturbation the invariant curves 
with rational rotation numbers are replaced by an alternating se- 
quence of stable and unstable periodic orbits. So if there are in- 
variant curves that survive perturbation they must have irrational 
rotation numbers. 

When we added a perturbation, we got chains of alternating 
stable and unstable fixed points for every rational rotation num- 
ber, and each stable fixed point is surrounded by an island that 
occupies some region of the section. Since the rational numbers 
are dense and each occupies a region one might wonder if any 
invariant curve survives the perturbation. Surely there are even 
more irrational rotation numbers to look at, but each irrational is 
arbitrarily close to a rational, so it is not obvious that any invari- 
ant curve can survive an arbitrarily small perturbation. 

Nevertheless, the Kolmogorov-Arnold-Moser (KAM) theorem 
proves invariant curves do exist if the perturbation is small enough, 
so that the perturbed problem is “close enough” to an integrable 
problem, and if the rotation number is “irrational enough.” We 
will not prove this theorem here. Instead we will develop methods 
for finding particular invariant curves. 

Stable periodic orbits have a stable island surrounding them on 
the surface of section. The largest islands are associated with ra- 
tionals with small denominators. In general the size of the island 
is limited to a size that decreases as the denominator increases. 
These islands are a local indication of the effect of the perturba- 
tion. Similarly, the chaotic zones appear near unstable periodic 
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Figure 4.18 The curves Co (solid) and C (dotted) for the 1:3 com- 
mensurability. The angle runs from —z to 7. The momentum runs from 
3.5 to 4.5 in appropriate units. 
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Figure 4.19 A surface of section displaying the 1:3 commensurability. 
The angle runs from —r to m. The momentum runs from 3.5 to 4.5 in 
appropriate units. 
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orbits and their homoclinic tangles. The homoclinic tangle is a 
continuous curve so it cannot cross an invariant curve, which is 
also continuous. If we are looking for invariant curves that persist 
upon perturbation, we would be wise to avoid regions of phase 
space where the islands or homoclinic tangles are major features. 

The Poincaré-Birkhoff islands are ordered by rotation number. 
Because of the twist condition, the rotation number is monotonic 
in the momentum of the unperturbed problem. If there is an 
invariant curve with a given rotation number it is sandwiched 
between island chains associated with rational rotation numbers. 
The rotation number of the invariant curve must be between the 
rotation numbers of the island chains on either side of it. 

The fact that the size of the islands decreases with the size of the 
denominator suggests that invariant curves with rotation numbers 
for which nearby rationals require large denominators are the most 
likely to exist. So we will begin our search for invariant curves by 
examining rotation numbers that are not near rationals with small 
denominators. 

Any irrational can be approximated by a sequence of rationals, 
and for each of these rationals we expect there to be stable and 
unstable periodic orbits with stable islands and homoclinic tan- 
gles. An invariant curve for a given rotation number has the best 
chance of surviving if the size of the islands associated with the 
each rational approximation is smaller than the separation of the 
islands from the invariant curve with that rotation number. 

For any particular size denominator, the best rational approxi- 
mation to an irrational number is given by an initial segment of a 
simple continued fraction. If the approximating continued fraction 
converges slowly to the irrational number then that number is not 
near rationals with small denominators. Thus, we will look for in- 
variant curves with rotation numbers that have slowly converging 
continued-fraction approximations. The continued fractions that 
converge most slowly have tails that are all one. For example, the 
golden ratio, 


AES h 1 
2 Le 
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$ ; (4.43) 


is just such a number. 
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4.6.1 Finding Invariant Curves 


Invariant curves, if there are any, are characterized by a particular 
rotation number. Points on the invariant curve map to points 
on the invariant curve. Neighboring points map to neighboring 
points, preserving the order. 

On the section for the unperturbed integrable system, the angle 
between successive section points is constant: A0 = 2mv(J), for 
rotation number v(J). This map of the circle onto itself with 
constant angular step we call a uniform circle map. 

For a given rotation number points on the section are laid down 
in a particular order characteristic of the rotation number only. As 
a perturbation is turned on, the invariant curve with a particular 
rotation number will be distorted and the angle between successive 
points will no longer be constant. All that is required to have a 
particular rotation number is that the average change in angle is 
A0. Nevertheless, the ordering of the points on the surface of 
section is preserved, and is characteristic of the rotation number. 

We can use the fact that the sequence of points on the surface of 
section for an invariant curve with a given rotation number must 
have a particular order to find the invariant curve. By evolving 
a candidate initial point with both the perturbed map and the 
uniform circle map and comparing the ordering of the sequence of 
points that are generated we can tell whether the initial point is 
on the desired invariant curve or to which side it is. 

Suppose we have a map that we can iterate to get the points 
on a section. Using the idea of comparing the ordering of points 
with the ordering of the uniform circle map, to indicate how the 
rotation number of our orbit compares to the specified rotation 
number, we can find the momentum, at a specified angle, for the 
invariant curve by bisection search: 


(define (find-invariant-curve map rn thetaO Jmin Jmax eps) 
(bisect (lambda (J) (which-way? rn thetaO J map)) 
Jmin Jmax eps)) 


However, we need to be able to determine which way to change 
the momentum to approach the required rotation number. 


13This depends on the assumptions that Jmin and Jmax bracket the actual mo- 
mentum, and that the rotation number is sufficiently continuous in momentum 
in that region. 
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We can evolve the orbits for both maps, producing streams of 
points that appear on the section. (The momentum value of the 
uniform circle map is superfluous.) Each orbit stream is trans- 
duced into a stream of positive integers. The integers give the 
number of points that have been examined in the stream that 
have smaller values of the angle. The streams of integers are then 
compared until a discrepancy is found. The first discrepancy is 
used to compare the rotation numbers of the two orbits, to deter- 
mine which orbit has smaller rotation number. 


(define (which-way? rn thetaO JO map) 

(compare-streams 

(position-stream theta0 
(orbit-stream map theta0 JO) 
0O) 

(position-stream theta0 
(orbit-stream (uniform-circle-map rn) 

theta0 JO) 

0) 

0)) 


The maps are evolved and built into a stream by a simple recursive 
procedure. The maps are represented in the same way that they 
appeared in section 3.6. 


(define (orbit-stream the-map x y) 
(cons-stream (list x y) 
(the-map X y 
(lambda (nx ny) 
(orbit-stream the-map nx ny)) 
(lambda () ’fail)))) 


The uniform-circle-map is a simple map that has a uniformly 
progressing angle with constant momentum. 


(define (uniform-circle-map rotation-number) 
(let ((delta-theta (* :2pi rotation-number))) 
(lambda (theta y result fail) 
(result ((principal-value :2pi) (+ theta delta-theta)) 
y)))) 


The procedure position-stream produces a stream of index po- 
sitions. It maintains an ordered list of angle values, and as each 
new angle is added to the list it adds the position index to the 
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stream. A principal value is applied to the angle to bring it to a 
uniform range specified. 


(define (position-stream cut orbit list) 
(insert! ((principal-value cut) (car (head orbit))) 
list 
(lambda (nlist position) 
(cons-stream 
position 
(position-stream cut (tail orbit) nlist))))) 


Given a new element x to be inserted into an ordered set set the 
procedure insert! calls its continuation with the updated set and 
the index that was used to insert the new element.!4 

The streams of indices are compared with compare streams. 
The count is used to keep track of how many points we have al- 
ready entered into the circle. When there is a discrepancy between 
the indices, it means that one stream has begun to lead the other. 
The principal-range procedure is used to determine which is the 
leader.!° This is analogous to using the principal value to deter- 
mine the direction from one angle to another on a circle. 


14The insert procedure is ugly: 


(define (insert! x set cont) 
(cond ((null? set) 
(cont (list x) 1)) 
((< x (car set)) 
(cont (cons x set) 0)) 
(else 
(let lp ((i 1) (lst set)) 
(cond ((null? (cdr lst)) 
(set-cdr! lst (cons x (cdr lst))) 
(cont set i)) 
((< x (cadr 1st)) 
(set-cdr! lst (cons x (cdr lst))) 
(cont set i)) 
(else 


(lp (+ i 1) (cdr 1st)))))))) 


The principal-range procedure is implemented as follows: 


(define ((principal-range period) index) 
(let ((t (- index (* period (floor (/ index period)))))) 
(if (< t (/ period 2.)) 
t 
(- t period)))) 
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(define (compare-streams si s2 count) 
(if (= (head s1) (head s2)) 
(compare-streams (tail s1) (tail s2) (+ count 1)) 
((principal-range count) (- (head s2) (head s1))))) 


Once we have created this mess we can use it to find the initial 
momentum (for a given initial angle) for an invariant curve with 
a given rotation number. We search the standard map for an 
invariant curve with a golden rotation number:!© 


(find-invariant-curve (standard-map 0.95) 
(- 1 (/ 1 golden-mean) ) 
0.0 


;Value: 2.114462280273437 


This algorithm, although correct, has terrible performance. The 
problem is that each orbit builds a table of length the number of 
points examined, and each insertion of a new point scans that 
table sequentially, thus making a process that grows as the square 
of the number of points examined in time and as the number of 
points examined in space. 

However, we observe that as ordering inconsistencies are found 
the angles are usually near the initial angle. We can make use 
of this to simplify the algorithm. Instead of keeping track of the 
whole list of angles, we can keep track of a small list of angles 
near the initial angle. In fact, keeping track of the nearest angle 
on either side of the initial angle works well. Here is the complete 
replacement for the which-way? procedure and its helpers. The 
procedure is implemented as a simple loop with state variables for 
the two orbits and the endpoints of the intervals. The z variables 
keep track of the angle of the uniform circle map; the x variables 
keep track of the angle of the map under study. The y variable 
is the momentum for the map under study. On each iteration we 
determine if the angle of the uniform circle map is in the interval of 
interest below or above the initial angle. If it is in neither interval 
then the map is further iterated. However, if it is in the region of 


16There is no invariant curve in the standard map with rotation number ¢ = 
1.618.... However 1 — 1/¢ has the same continued-fraction tail as ¢ and there 
are rotation numbers of this size in the standard map. 
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interest then we check to see if the angle of the other map is in the 
corresponding interval. If so, the intervals for the uniform circle 
map and the other map are narrowed and the iteration proceeds. If 
the angle is not in the required interval, a discrepancy is noted and 
the sign of the discrepancy is reported. For this process to make 
sense the differences between the angles for successive iterations 
of both maps must be less than 7a. 


(define (which-way? rotation-number x0 y0 the-map) 
(let ((pv (principal-value (+ x0 pi)))) 
(let lp ((z x0) (zmin (- x0 :2pi)) (zmax (+ x0 :2pi)) 
(x x0) (xmin (- xO :2pi)) (xmax (+ x0 :2pi)) 
(y yO)) 
(let ((nz (pv (+ z (* :2pi rotation-number))))) 
(the-map x y 
(lambda (nx ny) 
(let ((nx (pv nx))) 
(cond ((< x0 z zmax) 
(if (< x0 x xmax) 
(lp nz zmin z nx xmin x ny) 
(if (> x xmax) 1 -1))) 
((< zmin z x0) 
(if (< xmin x x0) 
(lp nz z zmax nx x xmax ny) 
(if (< x xmin) -1 1))) 
(else 
(lp nz zmin zmax nx xmin xmax ny))))) 
(lambda () 
(error "Map failed" x y))))))) 


With this method of comparing rotation numbers we can expect 
to be able to find the initial conditions for an invariant curve to 
high precision: 


(find-invariant-curve (standard-map 0.95) 
(- 1 (/ 1 golden-mean) ) 


1e-16) 
;Value: 2.1144605494391726 


Using initial conditions computed in this way we can produce 
the invariant curve. See figure 4.20. If we expand the putative 
invariant curve it should remain a curve for all magnifications—it 
should show no sign of chaotic fuzziness. See figure 4.21. 
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—T 0 | T 


Figure 4.20 A surface of section displaying the invariant curve at 
rotation number 1 — 1/¢ for the standard map with K = .95. The 
invariant curve is in context: there is a chaotic region that almost eats 
the curve. The angle and momentum run from 0 to 2r. 


Exercise 4.8: Invariant curves in the standard map 


Find another golden invariant curve in the standard map. Expand it to 
show that it retains the features of a curve at high magnification. 


4.6.2 Dissolution of Invariant Curves 


As can be seen from figure 4.21 the points on an invariant curve 
are not uniformly visited, unlike the picture we would get plotting 
the angles for the uniform circle map. This is because an interval 
may be expanded or compressed when mapped. We can compute 
the relative probability density for visitation of each angle on the 
invariant curve. A crude way to obtain this result is to count the 
number of points that fall into equal incremental angle bins. It is 
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Figure 4.21 Here is a small portion of the same invariant curve shown 
in figure 4.20. The curve is magnified by 27 x 107. We see that even at 
this magnification the points appear to lie on a line. We also see that 
the visitation frequency of points is highly nonuniform. 


more effective to use the linear variational map constructed from 
the map being investigated to allow us to compute the change 
in incremental angle from one point to its successor. Since all of 
the points in a small interval around the source point are mapped 
to points (in the same order) in a small interval around the tar- 
get point, the relative probability density at a point is inversely 
proportional to the size of the incremental interval around that 
point. In order to get this started we need a good estimate of the 
initial slope for the invariant curve. We can estimate the slope by 
a difference quotient of the momentum and angle increments for 
the interval that we used to refine the momentum of the invariant 
curve with a given rotation number. Figures 4.22 and 4.23 show 
the relative probability density of visitation as a function of angle 
for the invariant curve of golden winding number in the standard 
map for three different values of the parameter K. As K increases, 
certain angles become less likely. Near K = 0.971635406 some an- 
gles are never visited. But the invariant curve must be continuous. 
Thus it appears that for larger K the invariant curve with this ro- 
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tation number will not exist. Indeed, if the invariant set persists 
with the given rotation number it will have an infinite number of 
holes (because it has an irrational winding number). Such a set is 
sometimes called a cantorus. 


Exercise 4.9: Dissolution of invariant curves 


As the parameter K is increased beyond the critical value the golden 
invariant curve ceases to exist. Investigate how the method for finding 
invariant curves fails beyond the critical value of K. 


Exercise 4.10: Hard 


Make programs that reproduce figures 4.22, 4.22, and 4.23. You will 
need to develop an effective method of estimating the probability of 
visitation. There is one suggestion of how to do that in the text, but 
you may find a better way. 
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Figure 4.22 The relative probability density of visitation as a func- 
tion of angle for the invariant curve of golden winding number in the 
standard map with K = 0.95 (above) and K = 0.97 (below). As K in- 
creases the function becomes more complex and certain angles become 
less likely to be visited. 
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Figure 4.23 The relative probability density of visitation as a func- 
tion of angle for the invariant curve of golden winding number in the 
standard map with K = 0.971635406. Here the function is very complex 
and appears self similar. The valleys appear to reach to zero, so there 
are discrete angles that are never visited. 


5 


Canonical Transformations 


We have done considerable mountain climbing. 
Now we are in the rarefied atmosphere of theories 
of excessive beauty and we are nearing a high 
plateau on which geometry, optics, mechanics, and 
wave mechanics meet on common ground. Only 
concentrated thinking, and a considerable amount 
of re-creation, will reveal the beauty of our subject 
in which the last word has not been spoken. 


Cornelius Lanczos, The Variational Principles of 
Mechanics, (1970, 1982), p. 229. 


One way to simplify the analysis of a problem is to express the 
problem in a form where the solution has a simple representation. 
However, the initial formulation of the problem may be easier to 
express in other terms. For example, the formulation of the prob- 
lem of the motion of a number of gravitating bodies is simple in 
rectangular coordinates, but it is easier to understand aspects of 
the motion in terms of orbital elements, such as the semimajor 
axes, eccentricities, and inclinations of the orbits. The semimajor 
axis and eccentricity of an orbit depend on both the configuration 
and the velocity of the body. Such transformations are more gen- 
eral than those that express changes in configuration coordinates. 
Here we investigate transformations of phase space coordinates 
that involve both the generalized coordinates and the generalized 
momenta. 

Suppose we have two different Hamiltonian systems, and sup- 
pose the trajectories of the two systems are in one-to-one corre- 
spondence with each other. In this case both Hamiltonian systems 
can be mathematical models of the same physical system. Some 
questions about the physical system may be easier to answer by 
reference to one model and others may be easier to answer in 
the other model. For example, it may be easier to formulate the 
physical system in one model and to discover a conserved quan- 
tity in the other. Canonical transformations are maps between 
Hamiltonian systems that preserve the dynamics. 
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A canonical transformation is a phase space coordinate trans- 
formation and an associated transformation of the Hamiltonian 
such that the dynamics given by Hamilton’s equations in the two 
representations describe the same evolution of the system. 


5.1 Point Transformations 


A point transformation is a canonical transformation that extends 
a possibly time-dependent transformation of the configuration co- 
ordinates to a phase space transformation. For example, one 
might want to reexpress motion in terms of polar coordinates, 
given a description in terms of rectangular coordinates. In order 
to extend a transformation of the configuration coordinates to a 
phase space transformation we must specify how the momenta and 
Hamiltonian are transformed. 

We have already seen how configuration transformations can 
be carried out in the Lagrangian formulation (see section 1.6.1). 
In that case, we found that if the Lagrangian transforms by com- 
position with the coordinate transformation, then the Lagrange 
equations are equivalent. 

Lagrangians that differ by the addition of a total time deriva- 
tive are equivalent, but have different momenta conjugate to the 
generalized coordinates. So there is more than one way to make 
a canonical extension of a coordinate transformation. 

Here, we find that particular canonical extension of a coordinate 
transformation for which the Lagrangians transform by composi- 
tion with the transformation, with no extra total time derivative 
terms added to the Lagrangian. 

Let L be a Lagrangian for a system. Consider the coordinate 
transformation q = F(t,q'). The velocities transform by 


v = F(t, g) + AF (t, gW. (5.1) 


We can obtain a Lagrangian in the transformed coordinates by 
composition L(t, q’,v’) = L(t,q, v) 


U(t,q,v') = L(t, F(t, q), oF (t, q) + F(t, 7/)v’). (5.2) 
The momentum conjugate to q is 


p = L(t, q', v’) 
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= L(t, F(t, 7), oF (t,q') + F(t, 7)v') aF (t,q') 
= po F(t, q’), (5.3) 


where we have used 


p= OnL(t, q, v) 
= O2L(t, F(t, q’), F(t, 7) T OF (t, qw’). (5.4) 


So, from equation (5.3),! 
p=p(AF(t.d))*. (5.5) 


We can collect these results to define a canonical phase space 
transformation C:? 


(t,q,p) = C(t, d'p) 
= (t, F(t,7'),p'(aF(t.7))~*). (5.6) 


The Hamiltonian is obtained by the Legendre transform 
H'(t,q',p') = p'v' — L'(t,q',v’) 
= (pr F(t, q')) (AF (td) (v — OF (t,)))) 


— L(t,q,v) 
= pu — L(t, q, v) — põoF (t, 7’) 
= H(t,q,p) — pooF (t, q'), (5.7) 


using relations (5.1) and (5.5) in the second step. Fully expressed 
in terms of the transformed coordinates and momenta the trans- 
formed Hamiltonian is 


H'(t,q',p') = H(t, F(t, q), p (AF (t,7))') 
— (p' (OF (t,7))*)OoF (t, q'). (5.8) 


1 Solving for p in terms of p’ involves multiplying equation (5.3) on the right by 
(01 F(t, q’))~*. This inverse is the structure that when multiplying 0) F(t, q’) 
on the right gives a identity structure. Structures representing linear trans- 
formations may be represented in terms of matrices. In this case, the matrix 
representation of the inverse structure is the inverse matrix of the matrix 
representing the given structure. 


?In chapter 1 the transformation C takes a local tuple in one coordinate system 
and gives a local tuple in another coordinate system. In this chapter C is a 
phase-space transformation. 
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The Hamiltonians H’ and H are equivalent because L and L’ have 
the same value for a given dynamical state and so have the same 
paths of stationary action. In general H and H’ do not have the 
same values for a given dynamical state, but differ by a term that 
depends on the coordinate transformation. 

For time-independent transformations, Op fF = 0, there are a 
number of simplifications. The relationship of the velocities (5.1) 
becomes 


v= F(t, gw". (5.9) 


Comparing this to the relation (5.5) between the momenta, we 
see that in this case the momenta transform “oppositely” to the 
velocities? 


pv = p'(O.F (t, d)) OF (t, qd) =p'v', (5.10) 


so the product of the momenta and the velocities is not changed 
by the transformation. This, combined with the fact that by con- 
struction L(t, q, v) = L’(t,q’,v’), shows that 


H(t, q, p) = pu — L(t, q, v) 
= pu T L'(t, q', v') 
= H'(t,q', p’). (5.11) 


For time-independent coordinate transformations the Hamiltonian 
transforms by composition with the associated phase-space trans- 
formation. We can also see this from the general relationship (5.7) 
between the Hamiltonians. 


Implementing point transformations 

The procedure F->CT takes a procedure implementing a transfor- 
mation of configuration coordinates F and returns a procedure 
implementing a transformation of phase-space coordinates. 


3The velocities and the momenta are dual geometric objects with respect to 
time-independent point transformations. The velocities comprise a vector field 
on the configuration manifold, and the momenta comprise a covector field on 
the configuration manifold. The invariance of the inner product pv under point 
transformations provides the motivation for the use of superscripts for velocity 
components and subscripts for momentum components in our notation. 
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(define ((F->CT F) H-state) 
(up (time H-state) 
(F H-state) 
(* (momentum H-state) 
(invert (((partial 1) F) H-state))))) 


Consider a particle moving in a central field. In rectangular 
coordinates a Hamiltonian is: 


(define ((H-central m V) H-state) 
(let ((x (coordinate H-state)) 
(p (momentum H-state))) 
(+ (/ (square p) (* 2 m)) 
(V (sqrt (square x)))))) 


Let’s look at this Hamiltonian in polar coordinates. The phase 
space transformation is obtained by applying F->CT to the pro- 
cedure p->r that takes a time and a polar tuple and returns a 
tuple of rectangular coordinates (see section 1.6.1). The trans- 
formation is time-independent so the Hamiltonian transforms by 
composition. In polar coordinates the Hamiltonian is: 


(show-expression 
((compose (H-central ’m (literal-function ’V)) 
(F->CT p->r)) 
(up ?t 
(up ’r ’phi) 
(down ’p_r ’p_phi)))) 


1,2 1,2 
aPr 3P 
m mr? 


V (r)+ 


There are three terms. There is the potential energy, which de- 
pends on the radius, there is the kinetic energy due to radial mo- 
tion, and there is the kinetic energy due to tangential motion. As 
expected, the angle ¢ does not appear and thus the angular mo- 
mentum is a conserved quantity. By going to polar coordinates we 
have decoupled one of the two degrees of freedom in the problem. 


Exercise 5.1: Rotations 


Let q and q’ be rectangular coordinates that are related by a rotation 
R: q= Rd’. The Lagrangian for the system is L(t, q, v) = imu? -V (q). 
Find the corresponding phase space transformation C. Compare the 
transformation equations for the rectangular components of the mo- 
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menta to those for the rectangular components of the velocities. Are 
you surprised, considering equation (5.10)? 


5.2 General Canonical Transformations 


Although we have shown how to extend any point transformation 
of the configuration space to a canonical transformation, there are 
other ways to construct canonical transformations. How do we 
know if we have a canonical transformation? To test if a trans- 
formation is canonical we may use the fact that if the transfor- 
mation is canonical then Hamilton’s equations of motion for the 
transformed system and the original system will be equivalent. 
Consider a Hamiltonian H and a phase space transformation 
C. The transformation C transforms the phase space path o'(t) = 


(t, q'(t), p'(t)) into a(t) = (t, a(t), p(t): 
o=C0o0. (5.12) 


The rates of change of the phase-space coordinates are trans- 
formed by the derivative of the transformation 


Do = D(C o o') = (DC o o')Do'. (5.13) 
Let D, be the phase-space derivative operator 

D:H(t,q, p) = (1, 02H (t, q, p), —31 H(t, q, p)) - (5.14) 
Hamilton’s equations are 

Do = Doi og; (5.15) 


for any realizable phase-space path ø. 

The transformation is canonical if the equations of motion ob- 
tained from the new Hamiltonian are the same as those that could 
be obtained by transforming the equations of motion derived from 
the original Hamiltonian to the new coordinates: 


Do = (DC o o')Do' = (DC o o')D;H' oa". (5.16) 
Comparing equation (5.15) with this we see 


De oe = (DC Co Da’ o o'. (5.17) 
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Using o = Coo’ we find 
Di oC oo = (DC ca) DH oo. (5.18) 


This condition must hold for any realizable phase-space path o’. 
Certainly this is true if the following condition holds for every 
phase-space point: 


D,H oC = DC - (D,H’). (5.19) 


Any transformation that satisfies equation (5.19) is a canonical 
transformation among phase-space representations of a dynamical 
system. In one phase-space representation the system’s dynamics 
is characterized by the Hamiltonian H’ and in the other by H. 
The idea behind this equation is illustrated in figure 5.1. 


a 
a. 


Figure 5.1 A canonical transformation C relates the descriptions of a 
dynamical system in two phase-space coordinate systems. The transfor- 
mation shows how Hamilton’s equations in one coordinate system may 
be derived from Hamilton’s equations in the other coordinate system. 


We can formalize this test as a program: 
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(define (canonical? C H Hprime) 
(- (compose (phase-space-derivative H) C) 
(* (D C) (phase-space-derivative Hprime)))) 


where phase-space-derivative, which was introduced in chap- 
ter 3, implements D,. The transformation is canonical if these 
residuals are zero. 

If a suitable Hamiltonian for the transformed system is obtained 
by composing H with the phase space transformation, we obtain 
a more specific formula: 


D,H oC = DCD,(H oC). (5.20) 
and a more specific test 


(define (compositional-canonical? C H) 
(canonical? C H (compose H C))) 


Using this test we can verify that the polar-to-rectangular trans- 
formation satisfies the test for a canonical transformation on a 
general central field: 


(print-expression 
((compositional-canonical? 
(F->CT p->r) 
(H-central ’m (literal-function ’V))) 
(up °t 
(up ’r ’phi) 
(down ’p_r ’p-phi)))) 
(up 0 (up 0 0) (down 0 0)) 


The residuals are zero so the transformation is canonical. 


Exercise 5.2: Group properties 


If we say that C is canonical with respect to Hamiltonians H and H’ if 
and only if D,H oC = DC - D,H", then: 


a. Show that the composition of canonical transformations is canonical. 
b. Show that composition of canonical transformations is associative. 
c. Show that the identity transformation is canonical. 


d. Show that there is an inverse for a canonical transformation and the 
inverse is canonical. 
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5.2.1 Time-independent Canonical Transformations 


We have defined a canonical transformation as a transformation 
of phase space coordinates for which Hamilton’s equations trans- 
form appropriately. The conditions that a canonical transforma- 
tion must satisfy (equations 5.19 or 5.20) involve the Hamilto- 
nians. If the Hamiltonians transform by composition and the 
transformation is time-independent then we can tell if the phase 
space transformation is canonical without further reference to the 
Hamiltonian. 

First, we reformulate Hamilton’s equations in a slightly different 
form. Hamilton’s equations are constructed from the derivative of 
the Hamiltonian by rearranging the components and then negating 
some of them. We introduce a shuffle function that does this 
rearrangement: 


J({a, b, cl) = (0, c, —b) . (5.21) 


The argument to J is a down tuple of components of the derivative 
of a Hamiltonian-like function. The shuffle function is linear. We 
also introduce a constant function: 


T([a, b, c]) = (1,0,0). (5.22) 
With these Hamilton’s equations can be expressed 
Do = (J +T) o DH o0. (5.23) 


Using J and J the canonical condition (5.20) can be rewritten 


(J+T)o(DH)oC = DC. ((J+T)o(D(H o0 0)) (5.24) 
= DC - (Jo ((DH o C) -(DC))) 
+ DC. (T o (D(H o C))) (5.25) 


The value of T does not depend on its arguments, and for time- 
independent transformations T = DC - T, so the canonical condi- 
tion becomes 


Jo(DH)oC = DC. (Jo((DH oC) -(DC))). (5.26) 
Applied to a particular phase-space state s this is 


J(DH(C(s))) = DC(s) - J(DH(C(s))  DC(s)). (5.27) 


326 Chapter 5 Canonical Transformations 


Let ® be a function that takes a multiplier and produces a linear 
transformation that multiplies the multiplier by the argument to 
the linear transformation: 


®(A)(v) = A-v. (5.28) 


Similarly, let ®* be a function that takes a multiplier and produces 
a linear transformation that multiplies the argument to the linear 
transformation by the multiplier: 


5*(A)(p) = p- A. (5.29) 


Using ® and * we can rewrite condition (5.27) as 


FDH(C(s))) 
= (@(DC(s)) o Fo &*(DC(s)))(DH(C(s))). (5.30) 


This condition is satisfied if 
J = ®(DC(s)) o J o &*(DC(s)). (5.31) 


A time-independent transformation C is canonical, for Hamiltoni- 
ans that transform by composition, if this condition on its deriva- 
tive DC is satisfied. 

Note that the condition (5.31) does not refer to the Hamilto- 
nian. This is a remarkable result. Though we have assumed the 
Hamiltonians transform by composition with the transformation, 
we can decide whether a time-independent phase-space transfor- 
mation preserves the dynamics of Hamilton’s equation without 
further reference to the details of the dynamical system. 

The test is implemented: 


(define ((time-independent-canonical? C) s) 
((- J-func 
(compose (Phi ((D C) s)) 
J-func 
(Phix ((D C) s)))) 
(compatible-shape s))) 
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(define (J-func DH) 
(up O (ref DH 2) (- (ref DH 1)))) 


(define ((Phi A) v) (* A v)) 
(define ((Phix A) w) (* w A)) 


This procedure tests whether a composition of functions is the 
same function as J by computing their difference when applied to 
a general typical argument.4 Here they are applied to a structure 
with the shape of DH(s), for an arbitrary phase-space state s.° 

For example, consider the following polar-canonical transforma- 
tion: 


(t, £, Dx) = Ca (t, 0, I) (5.32) 
where 
pad 
x =4/— sin (5.33) 
a 
P£ = V 2al cos 6. (5.34) 


Here a is an arbitrary parameter that we may set to whatever is 
convenient. We define: 


(define ((polar-canonical alpha) H-state) 
(let ((t (state->t H-state)) 
(theta (coordinate H-state)) 
(I (momentum H-state))) 
(let ((x (* (sqrt (/ (* 2 I) alpha)) (sin theta))) 
(p-x (* (sqrt (* 2 alpha I)) (cos theta)))) 
(up t x p_x)))) 


And now we just run our test: 


“It is in principle impossible to generally determine if two functions are the 
same, but in this case, since ®(DC(s)) is linear, this test is valid. 


>The shape of DH(s) is a compatible shape to the shape of s: if they are 
multiplied the result is a real number. The procedure compatible-shape takes 
any structure and produces another structure that is guaranteed to multiply 
with the given structure to produce a real number. The structure produced 
is filled with unique real literals, so if the residual is zero then the functions 
are the same. 
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(print-expression 

((time-independent-canonical? (polar-canonical ’alpha)) 
(up °t ’theta ’I))) 

(up 0 0 0) 


So the transformation is canonical.® 

Of course, not every transformation we might try is canonical. 
For example, we might try x = psin@ with p = pcos0. The 
implementation is” 


(define (a-non-canonical-transform H-state) 
(let ((t (state->t H-state)) 
(theta (coordinate H-state)) 
(p (momentum H-state))) 
(let ((x (* p (sin theta) )) 
(p_x (* p (cos theta)))) 
(up t x p_x)))) 


(print-expression 
((time-independent-canonical? a-non-canonical-transform) 
(up ’t ’theta ’p))) 


(up 0 (+ (* -1 p x8102) x8102) (+ (* p x8101) (* -1 x8101))) 


So this transformation is not compositional canonical. 


Harmonic oscillator 

The analysis of the harmonic oscillator illustrates the use of a 
general canonical transformation in the solution of a problem. The 
harmonic oscillator is a mathematical model of a simple spring- 
mass system. The Hamiltonian for a spring mass system with 
mass m and spring constant k is 


1 
H(t,x,px) = 5 4 ke. (5.35) 


6 Actually, for J = 0 the transform is not well defined and so it is not composi- 
tional canonical for that value. This transformation is “locally compositional 
canonical” in that it is compositional canonical for nonzero values of I. We 
will ignore this essentially topological problem. 


“The mysterious symbols such as x8102 are unique real literals introduced to 
test functional equalities. That they appeared in a residual demonstrates that 
the equality is invalid. 
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Hamilton’s equations of motion are 


Dz = p/m 
Dp; = —kz, (5.36) 


giving the second order system 
mD*z + kz =0. (5.37) 


The solution is 


x(t) = Asin(wt + ¢), (5.38) 
where 
w = y k/m (5.39) 


and where A and ¢ are determined by initial conditions. 

Let’s try our polar-canonical transformation Ca on the har- 
monic oscillator. We substitute expressions (5.33) and (5.34) for 
x and pz in the Hamiltonian, getting our new Hamiltonian: 


I 
H'(t,0, I) = H Ceos 0)? + “(sin 6)”. (5.40) 


If we choose a = vkm then we obtain 


H'(t,0,I) = ger =w], (5.41) 
m 


and the new Hamiltonian no longer depends on the coordinate. 
Hamilton’s equation for I is 


DI(t) = —0,H'(t, A(t), I(t)) = 0, (5.42) 
so I is constant. The equation for 6 is 

D6(t) = 02H’ (t, 6(t), IH) = w. (5.43) 
So 


6(t) = wt + ¢. (5.44) 
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In the original variables 


x(t) = V/21(t)/asin 0(t) 


= Asin(wt+ ¢), (5.45) 
with the constant A = \/2I(t)/a. So we have found the solu- 


tion to the problem by making a canonical transformation to new 
phase space variables for which the solution is trivial and then 
transforming the solutions back to the original variables. 


Exercise 5.3: Trouble in Lagrangian world 


Is there a Lagrangian L’ that corresponds to the harmonic oscillator 
Hamiltonian H’(t,0,I) = wI? What could this possibly mean? 


Exercise 5.4: Polar-canonical transformations 


Let x, p and 0, I be two sets of canonically conjugate variables. Consider 
transformations of the form x = BI“ sin 0 and p = BI% cos 0. Determine 
all a and 8 for which this transformation is compositional canonical. 


Exercise 5.5: Standard map 


Is the standard map a canonical transformation? Recall that the stan- 
dard map is: J’ = I + K sin 9, with 6’ = 0 + 1’, both modulo 27. 


5.2.2 Symplectic Transformations 


Condition (5.31) involves the composition of functions, all of which 
are linear transformations. Linear transformations can be repre- 
sented in terms of matrices. A matrix representation is defined 
with respect to a basis. For incremental Hamiltonian states we 
organize the state components as a column matrix of time, the 
components of the coordinates, and the corresponding components 
of the momenta. 7 

Let J and DC be the matrix representations of J and ®(DC(s)), 
respectively, and where s is the arbitrary phase-space state at 
which the canonical condition is being tested. The matrix repre- 
sentation of &*(DC(s)) is the transpose of DC. In terms of these 
matrix representations the test for canonical becomes 


J = (DC) J (DC)'. (5.46) 


We say that a transformation is symplectic if the matrix represen- 
tation of its derivative satisfies this identity. 
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The matrix representation of the multiplier for the linear trans- 
formation J is J. We can find the multiplier for a linear transfor- 
mation by taking the derivative of the linear transformation and 
evaluating it at an arbitrary point: DJ([a,b,c]). We can obtain a 
matrix representation with the utility s->m that takes a multiplier 
of a linear transformation and returns a matrix representation of 
the multiplier.? The matrix J depends only on the number of 
degrees of freedom. For example, the J for a system with two 
degrees of freedom is: 


(print-expression 
(let* ((s (typical-H-state 2)) 
(s* (compatible-shape s))) 
(s->m s* ((D J-func) s*) s*))) 
(matrix-by-rows (list 0 0 0 0 0) 


(list 0001 0) 
(list 0000 1) 
(list 0 -1 0 0 0) 
(list 0 0 -1 0: 0)) 


In terms of matrix representations, the test that a transforma- 
tion is symplectic is: 


(define ((symplectic? C) s) 
(let ((s* (compatible-shape s))) 
(let ((J (s->m s* ((D J-func) s*) s*)) 
(DCs (s->m s* ((D C) s) s))) 
(- J (* DCs J (m:transpose DCs)))))) 


For example, we can verify that the point transformation de- 
rived from the coordinate transformation p->r is symplectic: 


®The derivative of a linear transformation is a constant function, independent 
of the argument. 


°The procedure s->m takes three arguments: (s->m s* A s). The s* and s 
specify the shapes of objects that multiply A on the left and right to give a 
numerical value; these specify the basis. 
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(print-expression 
((symplectic? (F->CT p->r)) 
(up ’t 
(up ’r ’varphi) 
(down ’p_r ’p_varphi)))) 


(matrix—by-rows (list 0 0 0 0 0) 
(list 0000 0) 
(list 0000 0) 
(list 0 0 0 0 0) 
(list 0 0 0 0 0)) 


There is a further simplification available. The elements of the 
first row and the first column of the matrix representation of J are 
all zeros. So the first and column of the matrix identity is always 
satisfied. So we can consider only the submatrix associated with 
the coordinates and the momenta. 2 

The qp submatrix!? of dimension 2n x 2n of the matrix J is 
called the symplectic unit for n degrees of freedom: 


—lnxn Onxn 
The matrix J, satisfies the following identities: 
J? = J; = -Jn (5.48) 
A 2n x 2n matrix A that satisfies the relation 
J; AJA" (5.49) 


is called a symplectic matrix. 
Here is an alternate test for whether a transformation is sym- 
plectic: 


The gp submatrix of a 2n + 1-dimensional square matrix is the 2n- 
dimensional matrix obtained by deleting the first row and the first column 
of the given matrix. This can be computed by: 


(define (qp-submatrix m) 
(m:submatrix m 1 (m:num-rows m) 1 (m:num-cols m))) 
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(define ((symplectic-transform? C) s) 
(symplectic-matrix? 
(qp-submatrix 
(s->m (compatible-shape s) 
((D C) s) 
s)))) 


(define (symplectic-matrix? M) 
(let ((2n (m:dimension M))) 
(let ((J (symplectic-unit (quotient 2n 2)))) 
(- J (* M J (m:transpose M)))))) 


The procedure symplectic-transform? returns a zero matrix if 
and only if the transformation being tested passes the symplectic 
matrix test. An appropriate symplectic unit matrix of a given size 
is produced by the procedure symplectic-unit. 

The point transformations are symplectic. For example, 


(print-expression 
((symplectic-transform? (F->CT p->r)) 
(up ?t 
(up ’r ’theta) 
(down ’p_r ’p_theta)))) 
(matrix-—by-rows (list 0 0 0 0) 
(list 0 0 0 0) 
(list 0 0 0 0) 
(list 0 0 0 0)) 


Exercise 5.6: Symplectic matrices 


Let A be a symplectic matrix: J, = AJ„AT. Show that AT and A`! 
are symplectic. 


Exercise 5.7: Whittaker transform 
Shew that the transformation q = log (à sin p’) with p = q'cotp' is 
symplectic. 


5.2.3 Time-Dependent Transformations 


We have found that time-independent transformations (involving 
the coordinates and conjugate momenta, but not the time) are 
canonical if the derivative of the transformation is symplectic. 
Let’s return to the calculation of the symplectic condition, but now 
allow explicit time dependence in the transformation equations. 
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If the transformation is time-dependent, then it turns out that 
H o C does not make a suitable H’. Instead, we assume 


H'=HoC+K, (5.50) 


and look for conditions on K and C that guarantee the transfor- 
mation is canonical. Equation (5.25), the condition that a trans- 
formation is canonical, becomes 


(J+T) 0(DH) oC = DC -(Jo((DH0oC)- DC + DK)) 
+ DC -(To((D(HoC))+DK)). (5.51) 


This condition is satisfied if the following two conditions are sat- 
isfied: 


J 0 (DH) oC = DC: (Jo ((DH oC) - (DC))) (5.52) 
and 


T o(DH) oC = DC: ((J +T) 0 (DK)) 
= DC -(Jo(DK)) + HC (5.53) 


Condition (5.52) is the condition that C is a symplectic trans- 
formation. Condition (5.53) is an auxiliary condition on K. This 
condition does not actually depend on the Hamiltonian H because 
the constant value of T does not depend on the argument. The 
time component is always satisfied; only the coordinate and mo- 
mentum components of this condition constrain K. Evaluated at 
a particular state s (with compatible shape s*) the condition on 
K is 


T(s*) = DC(s) - (J(DK(s)) + C(s), (5.54) 


explicitly showing that the Hamiltonian H does not enter. 

Thus we can conclude that a time-dependent transformation is 
canonical if its position-momentum part is symplectic and if we 
form the new Hamiltonian by adding an appropriate piece. Note 
that we have not proven that the position-momentum part must 
be symplectic. Rather we have shown that if this part is symplectic 
then the Hamiltonian must be modified in an appropriate way. 

As a program, the test for K is 
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(define ((canonical-K? C K) s) 
(let ((s* (compatible-shape s))) 
(- (T-func s*) 
(+ (* (@ C) s) (J-fune ((D K) s))) 
(((partial 0) C) s))))) 


Rotating coordinates 
Consider a time-dependent point transformation to uniformly ro- 
tating coordinates: 


q = ROG), (5.55) 
with components 


x = 2’ cos(Qt) — y'sin(Nt) 
y = x'sin(Qt) + y' cos(Mt). (5.56) 


As a program this is 


(define ((rotating n) state) 
(let ((t (time state)) 
(q (coordinate state))) 
(let ((x (ref q 0)) 
(y (ref q 1)) 
(z (ref q 2))) 
(up (+ (* (cos (* n t)) x) (* (sin (* n t)) y)) 
(- (* (cos (* n t)) y) (* (sin (* n t)) x)) 
z)))) 


The extension of this transformation to a phase space transforma- 
tion is 
(define (C-rotating Omega) (F->CT (rotating Omega) )) 


We first verify that the position-momentum part of this time- 
dependent transformation is symplectic: 
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(pe 
((symplectic-transform? (C-rotating ’Omega)) 
(up ’t 
(coordinate-tuple ’x ’y ?°z) 
(momentum-tuple ’px ’py ’pz)))) 


(matrix—by-rows (list 0 000 0 0) 
(list 00000 0) 
(list 00000 0) 
(list 00000 0) 
(list 00000 0) 
(list 0000 0 0)) 


For this transformation the appropriate correction to the Hamil- 
tonian is 


K(Q)(t; 2, 9,2 Pr Py Po) = -QL DP, — YP), (5.57) 


which is the rate of rotation of the coordinate system multiplied 
by the angular momentum. The justification for this will be given 
in section 5.6. The implementation is: 


(define ((K Omega) s) 
(let ((q (coordinate s)) (p (momentum s))) 
(let ((x (ref q 0)) (y (ref q 1)) 
(px (ref p 0)) (py (ref p 1))) 
(* -1 Omega (- (* x py) (* y px)))))) 


Applying the test: 


(print-expression 
((canonical-K? (C-rotating ’Omega) (K ’Omega)) 
(up ?t 
(up x y »Z) 
(down ’p-x ’p-y ’p-z)))) 
(up 0 (up 0 0 0) (down 0 0 0)) 


The residuals are zero so this K completes the canonical transfor- 
mation. 


5.2.4 The Symplectic Condition 


A transformation is symplectic if the pq part of the transformation 
has symplectic derivative. This condition can be written simply 
in terms of Poisson brackets. 
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The Poisson bracket can be written in terms of J: 


{f,g} = (Df) - (Jo (Dg)), (5.58) 


as can be seen by writing out the components. 
We break the transformation C' into position and momentum 
parts: 


q=A(t,d,p’) (5.59) 
p= B(t,q', p’). (5.60) 


In terms of the individual component functions the symplectic 
condition (5.31) is 


ô = {A’, Bj} 
0 = {A’, AI} 
0 = {B;, By} (5.61) 


where 6% is one if i = j and zero otherwise. These are called the 
fundamental Poisson brackets. If a transformation satisfies these 
fundamental Poisson bracket relations then it is symplectic. 

We have found that a time-dependent transformation is canon- 
ical if its position-momentum part is symplectic and we modify 
the Hamiltonian by the addition of a suitable K. We can rewrite 
these conditions in terms of Poisson brackets. If the Hamiltonian 
is 
H'(t,q',p') = H(t, A(t, ¢’,p'), Bit, dp) + Kp’), (5.62) 
the transformation will be canonical if the coordinate-momentum 


transformation satisfies the fundamental Poisson brackets, and K 
satisfies: 


{A K} + OA’ =0 
{B;, K}+ OB; =0. (5.63) 


Exercise 5.8: 


Fill in the details to show that the symplectic condition (5.31) is equiv- 
alent to the fundamental Poisson brackets (5.61) and that the condition 
on K (5.53) is equivalent to the Poisson bracket condition on K (5.63). 
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5.3 Invariants of Canonical Transformations 


Canonical transformations allow us to change the phase-space co- 
ordinate system that we use to express a problem, preserving the 
form of Hamilton’s equations. If we solve Hamilton’s equations in 
one phase-space coordinate system we can use the transformation 
to carry the solution to the other coordinate system. What other 
properties are preserved by a canonical transformation? 


Noninvariance of pv 

We noted in equation (5.10) that canonical extensions of point 
transformations preserved the value of pv. This does not hold for 
more general canonical transformations. We can illustrate this 
with the transformation just considered. Along corresponding 
paths x, px and 0, I 


x(t) = 210) sn O(t) 
Palt) = V2I(t)a cos 0 (t). (5.64) 


and so Dz is 


Dz(t) = DO(t) do 


cos #(t) + DI(t) sin O(t). (5.65) 


1 
/21(t)a 
The difference of pv and the transformed p’v’ is 


px(t)Da(t) — IDOC) 
= I(t) DO(t) (2cos? O(t) — 1) + DI(t) sin 0(t) cos A(t). (5.66) 


In general this is not zero. The product pv is not necessarily 
invariant under general canonical transformations. 


Invariance of Poisson brackets 

Here is a remarkable fact: the composition of the Poisson bracket 
of two phase space state functions with a canonical transforma- 
tion is the same as the Poisson bracket of each of the two functions 
composed with the transformation separately. Loosely speaking, 
the Poisson bracket is invariant under canonical phase space trans- 
formations. 
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Let f and g be two phase space state functions. Using the J 
representation of the Poisson bracket (see section 5.2.4), 


{f° C, go C} (s) 
= (D(f o C))(s) - (Jo D(g 0 C))(s8) 
= (Df °C)(s)) - DC(s) - (J((Dg 0 C(s)) - DC(s))) 
) 


= ((Df o C)(s)) - (J((Dg° C)(s))) 
= ({f,g} oC)(s), (5.67) 


where the fact that C is symplectic and satisfies equation (5.27) 
was used in the middle. Abstracted to functions of phase-space 
states, this is: 


{foC,goC)} = {f,9) oc. (5.68) 


Volume preservation 

Consider a canonical transformation C. Let G be a function with 
parameter t such that (q,p) = Ci(q',p’) if (t,q,p) = C(t, q', p"). 
The function C; maps phase space coordinates to alternate phase 
space coordinates at a given time. Consider regions R in (q,p) 
and R’ in (q’,p’) such that R = C;(R’). The volume of region R’ 


V(R) = A L= f aed). (5.69) 


Now if C is symplectic then the determinant of DC; is one (see 
section 4.2), so 


V(R) =V(R). (5.70) 


Thus, phase space volume is preserved by symplectic transforma- 
tions. 

Liouville’s theorem shows that time evolution preserves phase 
space volume. Here we see that canonical transformations also 
preserve phase volumes. Later, we will find that time evolution 
actually generates a canonical transformation. 


A bilinear form preserved by symplectic transformations 
The invariance of Poisson brackets under canonical transforma- 
tions can be used to prove the invariance of another closely-related 
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antisymmetric bilinear form under canonical transformations. De- 
fine! 


w(C1,¢2) = P(Q) — P(G1)Q(¢2), (5.71) 


where Q = J; and P = Ig are the coordinate and momentum 
selectors, respectively. The arguments ¢; and ¢2 are incremental 
phase space states. Under a canonical transformation s = C(s”), 
incremental states transform with the derivative 


G = DC(s')G. (5.72) 
We will show that 


w(Ci, C2) = w(i, C2), (5.73) 


provided the ¢/ have zero time component. 

Condition (5.27) that a time-independent C with compositional 
Hamiltonian H is canonical is equivalent to the symplectic condi- 
tion (5.31), which does not mention the Hamiltonian H. So for 
time-independent symplectic C, condition (5.27) is also satisfied 
with the Hamiltonian replaced by any function f on the phase- 
state space: 


I(DF(C(s))) = DC(s) - (DF ° ©))(s))- (5.74) 


We will use this in the following. 
In terms of w the Poisson bracket is 


{f, g}(s) =0((J © Df)(s), (Fo Dg)(s)), (5.75) 


as can be seen by writing out the components. We use the fact that 
Poisson brackets are invariant under canonical transformations 


({f,g} ° C)(s') ={foC,goC}(s'). (5.76) 


"The w form can also be written as a sum over degrees of freedom: 


w (C1562) = $ PilGa)Q"(61) = PiaR (G). 


Notice that the contributions for each 7 do not mix components from different 
degrees of freedom. 
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The left hand side of equation (5.76) is 
({f,g} ° C)(s') =w((J o Df o C)(s'), (J o Dg o C)(s')) 
w(DC(s') - (J o (D(F © C)(s"))), 
DC(s') - (J o (D(g © C)(s')))), (5.77) 


where we have used the useful relation (5.74). The right-hand side 
of equation (5.76) is 


{foC,goC}(s') = w((Jo D(f oC))(s'), (Fo D(go C))(s')).(5.78) 


Now the left-hand side must equal the right-hand side for any f 
and g, so the equation must also be true for arbitrary Ç! of the 
form: 


Gj = (Fo D(F oC))(s") 
& = (Fo D(go0))(s’). (5.79) 


The Ç; are arbitrary incremental states with zero time components. 
So we have proven that 


w(Ci, 0) = w(DC(s') - C1, DC(s’) - 6). (5.80) 


for canonical C and incremental states ¢/ with zero time compo- 
nents. Using equation (5.72) we have 


(61,03) = w(G, 2). (5.81) 


Thus the bilinear antisymmetric function w is invariant under 
canonical transformations. 
As a program w is: 


(define (omega zetal zeta2) 
(- (* (momentum zeta2) (coordinate zetal)) 
(* (momentum zeta1) (coordinate zeta2)))) 


We can check that it is invariant under the polar to rectangular 
canonical transformation by computing the residuals. We use the 
arbitrary state 


(define a-polar-state 
(up ?t 
(up ’r ’varphi) 
(down ’p_r ’p_varphi))) 
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and the typical state increments 


(define zeta1 
(up 0 
(typical-object (coordinate a-polar-state)) 
(typical-object (momentum a-polar-state)))) 


(define zeta2 
(up 0 
(typical-object (coordinate a-polar-state) ) 
(typical-object (momentum a-polar-state)))) 


Note that the time components of zeta1 and zeta2 are zero. We 
evaluate the residual: 


(print-expression 
(let ((DCs ((D (F->CT p->r)) a-polar-state))) 
(- (omega zetal zeta2) 
(omega (* DCs zetal) (* DCs zeta2))))) 
0 


The residual is zero so w is invariant under this canonical trans- 
formation. 


Poincaré integral invariants 

Consider the oriented area of a region R’ in phase space (see fig- 
ure 5.2). Suppose we make a canonical transformation from coor- 
dinates (q', p’) to (q, p) taking region R’ to region R. The bound- 
ary of the region in the transformed coordinates is just the image 
under the canonical transformation of the original boundary. Let 
Rgip, be the projection of the region R onto the g, pi plane of co- 
ordinate qf and conjugate momentum p;, and A; be its area. We 
call the q’,p; plane the it? canonical plane in these phase space 
variables. Similarly, let Ryg be the projection of R’ onto the 
q, p; plane, and A‘ be its area. Then it turns out that the sum 
of the areas of the projections of R and R’ are the same: 


VASYA, (5.82) 
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That is, the sum of the projected areas on the canonical planes is 
preserved by canonical transformations. Another way to say this 
is 


Zh. dq‘ dp; =f dq” dp’. (5.83) 


C 


Figure 5.2 A region R’ in phase space is mapped by a canonical 
transformation ČC to a region R. The projections of region R onto the 
planes formed by canonical basis pairs (qj, pj) are Rj. The projections 
of R' are R}. In general, the areas of the regions R and R’ are not the 
same, but the sum of the areas of the canonical plane projections are 
the same. 


To see why this is true we first consider how the area of an incre- 
mental parallelogram in phase space transforms under canonical 
transformation. Let (Aq, Ap) and (ôq, 6p) represent small incre- 
ments in phase space, originating at (q,p). Consider the incre- 
mental parallelogram with vertex at (q,p) with these two phase 
space increments as edges. The sum of the areas of the canonical 
projections of this incremental parallelogram can be written 


XO AA: = X (Ag'ôp: — Apidq’). (5.84) 


The right hand side is the sum of the areas on the canonical planes; 
for each i we see the area of a parallelogram computed from the 
components of the vectors defining its adjacent sides. Let ¢ = 
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(0, Aq, Ap) and ¢2 = (0,6q, 6p), then the sum of the areas of the 
incremental parallelograms is just 


X AA; = w(&i, 62), (5.85) 


where w is the bilinear antisymmetric function introduced above. 
The function w is invariant under canonical transformations, so 
the sum of the areas of the incremental parallelograms is invariant 
under canonical transformations. 

The area of an arbitrary region is just the limit of the sum of the 
areas of incremental parallelograms that cover the region, so the 
sum of oriented areas is preserved by canonical transformations: 


YAY A (5.86) 


We define an action-like region to be one for which canonical 
coordinates can be chosen so that the region is entirely in the 
subspace spanned by a particular canonical pair (qf, p;). For this 
coordinate system the projection on that plane has all of the area. 
The projections on the other canonical planes have no area. So 
the sum of the areas of the canonical projections is just the area 
of the region itself. The sum of the areas of the projections onto 
canonical planes is preserved under canonical transformation so 
the area of an action-like region is the sum of the areas of the 
canonical projections for any canonical coordinate system. 

There are also regions which have no action-like projection. For 
example, a region in the plane (qf, q) has no action-like projection. 
Therefore the sum of the areas of the canonical projections is zero, 
and this is the case for any canonical coordinate system, though 
in other canonical coordinates some of the projections may have 
non-zero area to be balanced by negative area of others. 

The equality of areas relation (5.83) can also be written as 
an equality of line integrals using Stokes’ Theorem, for simply- 


connected regions Rgp, and R’,, y, 
wed q Pi 


Ef vdi=Ef vad. (5.87) 
i a* py i ali,p! 


The canonical planes are disjoint except at the origin, so the pro- 
jected areas only intersect in at most one point. Thus we may 
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independently accumulate the line integrals around the bound- 
aries of the individual projections of the region onto the canonical 
planes into a line integral around the unprojected region: 


$, > pide’ = f, > pidd". (5.88) 


Exercise 5.9: Watch out 
Consider the canonical transformation C: 


(t,x, p) = C(t, 0, J) = (t, /2(J +a) sin, \/2(J + a) cos 0). 


a. Show that the transformation is symplectic for any a. 


b. Show that equation (5.88) is not generally satisfied for the region 
enclosed by a curve of constant J. 


5.4 Extended Phase Space 


In this section we show that we can treat time as just another 
coordinate if we wish. Systems described by a time-dependent 
Hamiltonian may be recast in terms of a time-independent Hamil- 
tonian with an extra degree of freedom. An advantage of this view 
is that what was a time-dependent canonical transformation can 
be treated as a time-independent transformation, where there are 
no additional conditions for adjusting the Hamiltonian. 

Suppose that we have some system characterized by a time- 
varying Hamiltonian. For example, a periodically-driven pendu- 
lum. We may imagine that there is some extremely massive oscil- 
lator, unperturbed by the motion of the relatively massless pen- 
dulum, that produces the drive. Indeed, we may think of time 
itself as the coordinate of an infinitely massive particle moving 
uniformly and driving everything else. We often consider the ro- 
tation of the Earth as exactly such a stable time reference when 
performing short-time experiments in the laboratory. 

More formally, consider a dynamical system with n degrees of 
freedom, whose behavior is described by a possibly time-dependent 
Lagrangian L with corresponding Hamiltonian H. We make a new 
dynamical system with n+ 1 degrees of freedom by extending the 
generalized coordinates to include time and introducing a new in- 
dependent variable. We also extend the generalized velocities to 
include a velocity for the time coordinate. In this new extended 
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state space the coordinates are redundant, so there is a constraint 
relating the time coordinate to the new independent variable. 

We relate the original dynamical system to the extended dy- 
namical system as follows: Let q be a coordinate path. Let 
de,t : T + qelT),t(T) be a coordinate path in the extended sys- 
tem where 7 is the new independent variable. Then qe = q © t, or 
de(T) = q(t(T)). Consequently, if v = Dq is the velocity along a 
path then ve(T) = Dge(r) = Da(t(r)) - Dt(T) = v(t(r)) -vlT). 

We can find a Lagrangian Le for the extended system by re- 
quiring that the value of the action is unchanged. Introduce the 
extended Lagrangian action 


Selge t(n) = f be Ptest (5.89) 
with 

Le(T; qe, qt; Ve, Vt) = L(t, Ge, Ve/ ve) vt. (5.90) 
We have 

Slal(t(71), t(T2)) = Selge, t] (T1, T2). (5.91) 


The Lagrange equations for qe are satisfied for exactly the same 
trajectories that satisfy the original Lagrange equations for q. 
The extended system is subject to a constraint that relates the 
time to the new independent variable. We assume the constraint 
is of the form ¢(T; qe, qt; Ve, Vt) = qt — f(T) = 0. The constraint is 
a holonomic constraint involving the coordinates and time, so we 
can incorporate this constraint by augmenting the Lagrangian:!? 


Li(T3 qe, qt, À; Ve, vt, Ur) = Lat, de, Ve/ vt) ve + valur — Df (7)).(5.92) 
The momenta conjugate to the coordinates are: 


Pe(T; qe, qt, Aj Ve, Ut, VA) = O2,0L6(T; qe, qt, Aj Ve, Vt, VA) 

= L( qt, qe, Ve/Ut) 

= P (qt, qe, Ve /vt) (5.93) 
PilT; des qt, À; Ve, Vt, VA) = 82,1 LelT; qe, qt, À; Ve, Vt, VA) 


We augment the Lagrangian with the total time derivative of the constraint 
so that the Legendre transform will be well defined. 
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L(t, qe, Ve/Vt) — O2L(qt, qe, Ve/Vt) (Ve/vt) 
+ Uy 
—E (qt, qe, Ve/Vt) + UX 


(5.94) 
PA (T; des qt, À; Ve, Ut, VA) = 02,24 (7; qe, Gt, Aj Ves Vt, VA) 
= % — Df(r). (5.95) 


So the extended momenta have the same values as the original 
momenta at the corresponding states. The momentum conjugate 
to the time coordinate is the negation of the energy plus v. The 
momentum conjugate to A is the constraint, which must be zero. 

Next we carry out the transformation to the corresponding 
Hamiltonian formulation. First, note that the Lagrangian Le is 
a homogeneous form of degree one in the velocities. Thus, by 
Euler’s theorem, 


Onli Tiley qt; Ve, Vt) (Ve, vt) = LelT; qe, qt; Ve, Vt), (5.96) 


and so the Legendre transform of Le is identically zero. For Li 
there are additional terms 
ðL! (T; de, qt, À; Ve, Ut; Uy) E (ve, Ut, Uy) 

= O2Le(T; des qt; Ve, Ut) i (ve, Ut) + UNUE + (v = Df(r))vr 

= Le(T; qe, qt; Ve, Ut) + vave + (ve — Df (7)) uy. (5.97) 


So the Hamiltonian H’! corresponding to L/, is 


Her; des qt, A; Pes Pts PA) = VAVE 
(pt F H (q, qe, Pe)) (Pr + Df(T)). (5.98) 


We have used the fact that that at corresponding states the mo- 
menta have the same values, so on paths pe = p o t, and 


E(dt, qe, ve/vt) = H (qt, qe, Pe). (5.99) 


The Hamiltonian H! does not depend on À so we deduce that 
p is constant. In fact p, must be given the value zero, because 
it is the constraint. When there is a cyclic coordinate we can 
form a reduced Hamiltonian for the remaining degrees of freedom 
by substituting the constant value of conserved momentum conju- 
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gate to the cyclic coordinate into the Hamiltonian. The resulting 
Hamiltonian is 


He(T; qe, qt; Pe, Pt) = (pe + H (qt, de, Pe)) Df (7). (5.100) 


This extended Hamiltonian governs the evolution of the extended 
system, for arbitrary f.13 
Hamilton’s equations reduce to 


Dqe(T) = O2H(t(T), gelT), Pe(T)) DF (7) 
Dt(r) = Df(r) 
Dpe(T) = -1 H(t(7), de(T), Pe(T)) DF (7) 
Dpi(7) = —OH(t(7), de(T), Pe(T)) DF (7). (5.101) 


The second equation gives the required relation between t and 
T. The first and third equations are equivalent to Hamilton’s 
equations in the original coordinates. We see this as follows. Using 
de = got these can be rewritten 


Dq(t(7))Di(r) = 02H (t(7), a(t(7)), PET) DECT) 
Dp(t(r))Dt(r) = -i HCT), a(t(7)), p(T) DF (7). (5.102) 


Using, Dt(r) = Df(r), and dividing out these factors out we 
recover Hamilton’s equations. !4 

Now consider the special case for which the time is the same 
as the independent variable: f(7) = 7, Df(r) = 1. In this case 
q = qe and p= pe. The extended Hamiltonian becomes 


H}(T; qe, t; Pe, Pt) = pt + H(t, qe, Pe). (5.103) 


Hamilton’s equation for t becomes Dt(r) = 1, restating the con- 
straint. The Hamilton’s equations for Dqe and Dpe are directly 
Hamilton’s equations 


Dq(T) = 2H (T, q(T), p(T)) 
Dp(T) = -31 H (7, q(T), p(T)). (5.104) 


13Once we have made this reduction, taking p, to be zero, we can no longer 
perform a Legendre transform back to the extended Lagrangian system; we 
cannot solve for p in terms of v;. However, the Legendre transform in the 
extended system from H; to L4, with associated state variables, is well defined. 


M47¢ f is strictly increasing then Df is never zero. 
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The extended Hamiltonian (5.103) does not depend on the inde- 
pendent variable, so it is a conserved quantity. Thus, up to an 
additive constant p is minus the energy. The Hamilton’s equa- 
tion for Dp; relates the change of the energy to 09H. Note that 
in the more general case, the momentum conjugate to the time is 
not the negation of the energy. This choice, t(T) = 7, is useful for 
a number of applications. 

Note that the extension transformation is canonical in the sense 
that the two sets of equations of motion describe equivalent dy- 
namics. However, the transformation is not symplectic; in fact it 
does not even have the same number of input and output variables. 


Exercise 5.10: Homogeneous extended Lagrangian 


Verify that Le is homogeneous of degree one in the velocities, and that 
its Legendre transform is zero. 


Exercise 5.11: Lagrange equations 

a. Verify the claim that the Lagrange equations for qe are satisfied for 
exactly the same trajectories that satisfy the original Lagrange equations 
for q. 

b. Verify the claim that the Lagrange equation for t relates the rate of 
change of energy to oL. 


Exercise 5.12: Lorentz transformations 


Investigate Lorentz transformations as point transformations in the ex- 
tended phase space. 


Restricted three-body problem 
An example that shows the utility of reformulating a problem in 
the extended phase space is the restricted three-body problem: 
the motion of a low mass particle subject to the gravitational 
attraction of two other massive bodies, which move in some fixed 
orbit. The problem is an idealization of the situation where a 
body with very small mass moves in the presence of two bodies 
with much larger masses. Any effects of the smaller body on the 
larger bodies are neglected. In the simplest version, the motion of 
all three bodies is assumed to be in the same plane, and the orbit 
of the two massive bodies is circular. 

The motion of the bodies with larger masses is not influenced 
by the small mass so we model this situation as the small body 
moving in a time-varying field of the larger bodies undergoing 
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a prescribed motion. This situation can be captured as a time 
dependent Hamiltonian: 


2 2 
Pz +P Gmm, Gmmoə 
H(t; £, Y; Pes Py) = 5 = . (5.105) 


where rı and rz are the distances of the small body to the larger 
bodies, and where m is the mass of the small body, and mı and 
mə are the masses of the larger bodies. Note that rı and r2 are 
quantities that depend both on the position of the small particle 
and the time-varying position of the massive particles. 

The massive bodies are in circular orbits, and maintain constant 
distance from the center of mass. Let a, and ag be the distances 
to the center of mass, then the distances satisfy mia, = mgao. 
The angular frequency is Q = \/G(m 1+ mz)/a? where a is the 
distance between the masses. 

In polar coordinates, with the center of mass of the subsystem 
of massive particles at the origin, and with r and @ describing the 
position of the low-mass particle, the positions of the two massive 
bodies are ag = m,a/(m +mz2) with 02 = Nt, ay = mza/(m 1 +mMz2) 
with 6; = Qt + 7. The distances to the point masses are 


r2 =r? + a2 — 2aer cos(6 — Nt) 
r? =r? + a? — 2a;rcos(6 — Nt — T). (5.106) 


So, in polar coordinates, the Hamiltonian is 


2, Po Gmm, Gmm 
Pr r2 rı ro ` 


1 
H(t; r, 0; pr, po) = 5— (5.107) 


2m 
We see therefore that the Hamiltonian can be written in terms of 
some function f such that 


H(t; r, 0; pr, po) = f(r,0 — Qt, pr, po). (5.108) 


The essential feature is that 0 and t only appear in the Hamilto- 
nian in the combination 0 — Qt. 

One way to get rid of the time dependence, is to choose a new 
set of variables with one coordinate equal to this combination 
0— Qt, by making a point transformation to a rotating frame. We 
have shown that 


r=r (5.109) 
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6’ =0- Qt (5.110) 
Pr = Pr (5.111) 
Po = Po (5.112) 
with 


H'(t;r',0'; p, po) = H(t; r',0' + Ot; ph, ph) — Oph 
= f(r", 6, Pr Po) — Po (5.113) 


is a canonical transformation. The new Hamiltonian, which is not 
the energy, is conserved because there is no explicit time depen- 
dence. It is a useful integral of motion—the Jacobi constant.!° 

We can also eliminate the dependence on the independent time- 
like variable from the Hamiltonian for the restricted problem by 
going to the extended phase space, choosing t = r. The Hamilto- 
nian 


H.(7; 7,9, t; pr, po, pt) = H(t; r, 0; pr, po) + pe 
= f(r,0 — Qt, Pr, po) + pi (5.114) 


is autonomous and is consequently an integral of the motion. 
Again, we see that 0 and t only occur in the combination 0 — Qt, 
which suggests a point transformation to a new coordinate 6/ = 
0 — Qt. This point transformation is independent of the new in- 
dependent variable 7. The transformation is specified in equa- 
tions (5.109-5.112), augmented by relations specifying the way 
the time coordinate and its conjugate momentum are handled: 


‘ee (5.115) 
pe = Nph + pi. (5.116) 


The new Hamiltonian is obtained by composing the old Hamilto- 
nian with the transformation 


Hi(rsr', 0, t; ph, Do, De) 
= He(T;r',0' + Qt, t; pi, ph, p, — OD) 
= f(r’, 0", Pr Po) +P, — QVp (5.117) 


15 Actually, the traditional Jacobi constant is C = —2H”. 
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We recognize that the new Hamiltonian in the extended phase 
space, which has the same value as the original Hamiltonian in the 
extended phase space, is just the Jacobi constant plus p}. Now, 
the new Hamiltonian does not depend on t’ so p, is a constant 
of the motion. In fact its value is irrelevant to the rest of the 
dynamical evolution, so we may set the value of pi, to zero if we 
like. Thus, we have found that the Hamiltonian in the extended 
phase space, which is conserved, is just the Jacobi constant plus 
an additive arbitrary constant. We have two routes to the Jacobi 
constant: (1) transform the original system to a rotating frame 
to eliminate the time dependence, but in the process add extra 
terms to the Hamiltonian, and (2) go to the extended phase space 
and immediately get an integral, and by going to a rotating frame 
of reference recognize that this Hamiltonian is the same as the 
Jacobi constant. So sometimes the Hamiltonian in the extended 
phase space is a useful integral. 


Exercise 5.13: Transformations in the extended phase space 


In section 5.2.3 we found that time-dependent transformations for which 
the derivative of the coordinate-momentum part is symplectic are canon- 
ical only if the Hamiltonian is modified by adding a function K subject 
to certain constraints (equation 5.54). Show that the constraints on K 
follow from the symplectic condition in the extended phase space, using 
the choice that t = T. 


5.4.1 Poincaré-Cartan Integral Invariant 


A time-dependent transformation is canonical if in the extended 
phase space the Hamiltonians transform by composition and the 
extended phase space transformation is symplectic. In section 5.3 
we have shown that if the derivative of the transformation is sym- 
plectic then the sum of the areas of the projections of any two- 
dimensional region of phase space onto the canonical qf, p; planes 
is preserved. This is also true of symplectic transformations in 
the extended phase space. Let R and R’ be corresponding regions 
of extended phase space coordinates. Let A; be the area of the 
projection of the region R onto the canonical qf, pi plane, and A; 
be the area of the projection of the corresponding region R’ onto 
the canonical q”, p; plane. In the extended phase space, we also 
have a projection onto the t, p; canonical plane. Let A, be the 
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area of the projection onto the t, p; plane. We have then 


SA = SA. (5.118) 
i=0 i=0 


In terms of integrals this is 


D) dq‘ dp; -5f dq” dp... (5.119) 
i=0 i i=0 i 


This equality for the sum of area integrals can be rewritten in 
terms of line integrals by Stokes’ theorem: 


fp È rad) = fp ave (5.120) 


where the order of the integration and summation can be reversed 
because the boundary of R projects to the boundary on the canon- 
ical planes. 

For the special choice of t = 7 this result can be rephrased in 
an interesting way. Let E be the value of the Hamiltonian in the 
original unextended phase space. Using q” = t and pn = pi = —E 
we can write 


n-1 n—-1 
D f dq’dp; — | dtdE = X` f dq" dp!, — | dt'dE' (5.121) 
i=o Ri Rn i=o ” Pi Ri, 


and 
n—1 ! n—1 
$ o> pidq' — Edt) = f 0D p,dq” — E'dt’). (5.122) 


The relations (5.121 and 5.122) are two formulations of the Poincaré- 
Cartan integral invariant. 


5.5 Reduced Phase Space 
Suppose we have a system with n+1 degrees of freedom described 


by a time-independent Hamiltonian in a 2n + 2 dimensional phase 
space. Here we can play the converse game: we can choose any 
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generalized coordinate to play the role of “time” and the negation 
of its conjugate momentum to play the role of a new n degree of 
freedom time-dependent Hamiltonian in a reduced phase space of 
2n dimensions. 

More precisely, let 


q= (0,0) 
p = [poss Pn], (5.123) 


and suppose we have a system described by a time-independent 
Hamiltonian 


H(t,q, p) = f(a, p) = E. (5.124) 


For each solution path there is a conserved quantity Æ. Let’s 
choose a coordinate q”? to be the time in a reduced phase space. 
We define the dynamical variables for the n degree of freedom 
reduced phase space: 


Gr = (Gh) Or”) 
P” = [Po +--+) Pn]: (5.125) 


In the original phase space a coordinate such as q” maps time to a 
coordinate. In the formulation of the reduced phase space we will 
have to use the inverse function 7 = (q”)~! to map the coordinate 
to the time, giving the new coordinates in terms of the new time 


qr = OT 
Pi = Pi oT, 
and thus 


Dé = D(¢ oT) = (Dé o T)(Dr) = (Dg o 7)/(Dq" oT) (5.126) 
Dp; = D(pi o 7) = (Dp; o T)(D7T) = (Dpi o T)/(Dqg” o7). (5.127) 
We propose that a Hamiltonian for our system in the reduced 


phase space is the negative of the inverse of f(q°, ..., q”; po, -.; Pn) = 
E with respect to the pn argument: 


H,(x, qr, p") = —(the pz such that f(qr, £; p”, Px) = E). (5.128) 


Note that in the reduced phase space we will have indices for the 
structured variables in the range 0...n— 1 whereas in the original 
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phase space the indices are in the range 0...n. We will show that 
H, is an appropriate Hamiltonian for the given dynamical system 
in the reduced phase space. To compute Hamilton’s equations we 
must expand the implicit definition of H,. We define an auxiliary 
function 


gl£, qr, P) = FG £; p", ae )): (5.129) 


Note that by construction this function is identically a constant 
g = E. Thus all of its partial derivatives are zero: 


og = (do f)” — (31 f)” 30H, = 0 
(O19)i = (Oof)i — (81 f)” (Or )i =0 
(329) = (1 f) — (31 f)” (82H, = 0, (5.130) 


where we have suppressed the arguments. Solving for partials of 
H,, we get 


(31 Hr)i = (3f); / (31 f)” = (AH); / (32 H)" (5.131) 
(O2Hr)’ = (3f / (Lf) = (2HY / (O2H)". (5.132) 


Using these relations we can deduce the Hamilton’s equations in 
the reduced phase space from the Hamilton’s equations in the 
original phase space. We thus obtain Hamilton’s equations in the 
reduced phase space: 


= (2H, (x, qr (£), p"(x)))* (5.133) 


) 
(x)))i- (5.134) 
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Orbits in a central field 
Consider planar motion in a central field. We have already seen 
this expressed in polar coordinates in equation (3.95): 


2 
p 
H(t; r, 6; Pr po) = = + 2 + V(r) (5.135) 


There are two degrees of freedom and the Hamiltonian is time- 
independent. Thus the energy, the value of the Hamiltonian, 
is conserved on realizable paths. Let’s forget about time and 
reparametrize this system in terms of the orbital radius r.1° To 
do this we solve 


for pr, obtaining 


253 
H'(r; $3 po) = —Pr = — (2m —Vi(r))- =) (5.137) 


which is the Hamiltonian in the reduced phase space. 
Hamilton’s equations are now quite simple: 


dp ôH p pA? 
a 2 (eme -V(r))— ze) (5.138) 
dp OH' 


We see that pg is independent of r (as it was with t), so for any 
particular orbit we may define a constant angular momentum L. 
Thus our problem ends up as a simple quadrature: 


(r) = a Z (mæ - V(r)) — =) = dr + do. (5.140) 


16We could have chosen to reparametrize in terms of ¢, but then both pr 
and r would occur in the resulting time-independent Hamiltonian. The path 
we have chosen takes advantage of the fact that @ does not appear in our 
Hamiltonian, so pg is a constant of the motion. This structure suggests that 
to solve this kind of problem we need to look ahead, as in playing chess. 
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To see the utility of this procedure we continue our example 
with a definite potential energy—a gravitating mass point: 


vasi, (5.141) 
When we substitute this into equation (5.140) we obtain a mess, 
which can be simplified to 

dr 


Lg Lj r/2mEr? + 2mur — L? eee Ome 


Integrating this, we obtain a further mess, which can be simplified 
and rearranged to obtain the following: 


1 mp 2EL? . 

F (: 14 m sin(d(r) — so) 2 (5.143) 
This can be recognized as the polar-coordinate form of the equa- 
tion of a conic section with eccentricity e and parameter p 


1 1 0 
Z an Eeee, (5.144) 
r P 
where 
2EL? i? 
e=4/1+—,, p=— and 6=¢9-¢(r) — Z. (5.145) 
mu mu 2 


In fact, if the orbit is an ellipse with semimajor axis a, we have 
p=a(l1— e°) (5.146) 


and so we can identify the role of energy and angular momentum 
in shaping the ellipse: 


E= a and L= \/mupa(1 — e?). (5.147) 
a 


What we get from analysis in the reduced phase space is the 
geometry of the trajectory, but we lose the time-domain behavior. 
The reduction is often worth the price. 

Although we have treated time in a special way up until now, 
we have found that time is not special. It can be included in the 
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coordinates to make a driven system autonomous. And it can be 
eliminated from any autonomous system in favor of any other co- 
ordinate. This leads to numerous strategies for simplifying prob- 
lems, by removing time variation, and then performing canonical 
transforms on the resulting conservative autonomous system to 
make a nice coordinate that we can then dump back into the role 
of time. 


5.6 Generating Functions 


We have considered a number of properties of general canonical 
transformations, without having a general method for coming up 
with them. Here we introduce the method of generating functions. 
The generating function is a real-valued function which compactly 
specifies a canonical transformation through its partial derivatives, 
as follows. 

Consider a real-valued function Fı(t,q,q') mapping configura- 
tions expressed in two coordinate systems to the reals. We will use 
F to construct a canonical transformation from one coordinate 
system to the other. We will show that the following relations 
among the coordinates, the momenta, and the Hamiltonians spec- 
ify a canonical transformation: 


p=H1F\(t, 4,7) (5.148) 
p = —O2F i (t, q, 7) (5.149) 
H'(t,q',p') — H(t, q, p) = OF (t,4,q). (5.150) 


The transformation will then be explicitly given by solving for 
one set of variables in terms of the others: To obtain the primed 
variables in terms of the unprimed ones, let A be the inverse of 
O, F, with respect to the third argument, 


q = A(t, q, F(t, q, q')), (5.151) 
then 
q = A(t,q, p) (5.152) 


p = —02F\(t, q, A(t, q,p)). (5.153) 
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Let B be the coordinate part of the phase space transformation 
q = B(t,q', p"). This B is an inverse function of 02F, satisfying 


q= B(t,q',—32F;(t, q, q')). (5.154) 
Using B we have and 


q= B(t, q, p") (5.155) 
p= OF (t, B(t, GP); q’). (5.156) 


To put the transformation in explicit form requires that the inverse 
functions A and B exist. 

We can use the above relations to verify that some given trans- 
formation from one set of phase space coordinates (q, p) with 
Hamiltonian function H(t, q, p) to another set (q’, p') with Hamil- 
tonian function H'(t,q',p') is canonical by finding an F(t, q, q’) 
such that the above relations are satisfied. We can also use ar- 
bitrarily chosen generating functions of type Fı to generate new 
canonical transformations. 


The polar-canonical transformation 
The polar-canonical transformation (5.32) from coordinate and 
momentum (zx, Px) to new coordinate and new momentum (6, T) 


21 
x =4/— sin (5.157) 
a 


Px = V2Ia cos0, (5.158) 


introduced earlier, is canonical. This can also be demonstrated by 
finding a suitable F generating function. The generating function 
satisfies a set of partial differential equations (5.148) and (5.149): 


Po = OF \(t, x, 0) (5.159) 
iL = —02F\(t, T, 0). (5.160) 


Using the relations (5.157) and (5.158), which specify the canoni- 
cal transformation, the first equation (5.159) can be rewritten 


Pr = ta-cotd = ô Fi (t,x), (5.161) 
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which is easily integrated to yield 
Fy(t, 2,0) = 5 cot 0 + @(t, 4) (5.162) 


where ¢ is some integration “constant” with respect to the first 
integration. Substituting this form for Fı into the second partial 
differential equation (5.160) we find 


a r? 


l= —02F\(t, x, 0) = 2 sin2 0 = 01 (t, A), (5.163) 


but we see that if we set @ = 0 the desired relations are recovered. 
So the generating function 


Fi (t,0;0) = se cot 0 (5.164) 


generates the polar-canonical transformation. This shows that 
this transformation is canonical. 
5.6.1 Fi Generates Canonical Transformations 


We can prove directly that the transformation generated by F3 is 
canonical by showing that if Hamilton’s equations are satisfied in 
one set of coordinates then Hamilton’s equations will be satisfied 
in the other set of coordinates. Let Fy take arguments (t, x,y). 
The relations among the coordinates are 

Px = nF (t, x,y) 

Py = —OoF y(t, T, y) (5.165) 


and the Hamiltonians are related by 
H’ (t, y, py) = H(t, £, Ps) + OF (t, £, y). (5.166) 


Substituting the generating function relations (5.165) into this 
equation, we have 


H'(t, y, —O2F (t,x, y)) = A(t, x, OVP £, y))+OoF i(t, x, y).(5.167) 
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Take the partial derivatives of this equality of expressions with 
respect to the variables x and y:'” 


—(02H"')) (01(02F1);)i = (31H); + (32 HV (3 (1LF); )i + (O100F1)i 
(01 H'); — (O2H")) (02(02F1);)i = (2H)? (02(O1F1)j)i + (3280F1); 
(5.168) 
where the arguments are unambiguous and have been suppressed. 
On solution paths we can use Hamilton’s equations for the (x, pz) 
system to replace the partial derivatives of H with derivatives of 
x and pz, obtaining 
—(02H") (0; (02F1);)i = —(Dpz)i + (Dx)! (81 (31 Fı)j)i + (3130F); 
(31 H’); — (02H) (O2(O2F1)3)i = (Dx) (O2(OF1);)i + (3230F; )i. 
(5.169) 


Now compute the derivative of py and py, from equations (5.165), 
along consistent paths 


(Dpe)i = (01(01F1)i)j(Dx)’ + (02(01F1)a)j (Dy)? + O0(OF 1): 
(Dpy)s = —(O1(O2F1)i)j(Dax)! — (02(02F1)i)j(Dy)? — O0(02F1)i- 
(5.170) 


Substituting the first of these into the first of equations (5.169) 
—(02H")! (O1(O2F1);)i = —(O1(2F1) 3) (Dy). (5.171) 
Note that (02(O1F\)i)j = (3i (O2F\)5)i- Provided that 0001 Fı is 


non-singular,*° we have derived one of Hamilton’s equations for 
the (y, py) system 


Dy(t) = 02H" (t, y(t), py(t))- (5.172) 


“Here we use indices to select particular components of structured objects. 
If an index symbol appears both as a superscript and as a subscript in an 
expression, the value of the expression is the sum over all possible values of the 
index symbol of the designated components (Einstein summation convention). 
Thus, for example, if ġ and p are of dimension n then the indicated product 
pig’ is to be interpreted as Drog pigt. 


18A structure is non-singular if the determinant of the matrix representation 
of the structure is non-zero. 
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The other Hamilton’s equation, 


Dp, (t) = —O1H"(t, y(t), py(t)), (5.173) 


can be derived in a similar way. So the generating function rela- 
tions indeed specify a canonical transformation. 

What we have shown is that the transformation is canonical, 
which means that the equations of motion transform appropri- 
ately; we have not shown that the qp part of the transformation 
is symplectic. If the transformation is time-independent then the 
Hamiltonians transform by composition, and in that circumstance 
we know that canonical implies symplectic. 


5.6.2 Generating Functions and Integral Invariants 


Generating functions can be used to specify a canonical transfor- 
mation by the prescription given above. We have shown that the 
generating function prescription gives a canonical transformation. 
Here we show how to get a generating function from a canonical 
transformation, and derive the generating function rules. 

The generating function representation of canonical transforma- 
tions can be derived from the Poincaré integral invariants. The 
outline is the following. We first show that, given a canonical 
transformation, the integral invariants imply the existence of a 
function of phase-space coordinates that can be written as a path- 
independent line integral. Then we show that partial derivatives 
of this function, represented in mixed coordinates, give the gener- 
ating function relations between the old and new coordinates. We 
only need to do this for time independent transformations because 
time dependent transformations become time independent in the 
extended phase space. 


Generating functions of type Fi 
Recall the result about integral invariants from section 5.3. There 
we found that 


pidq = f pidq"', 5.174 
frat gp E mg 


where R’ is a two dimensional region in (q’, p') coordinates at time 
t, and R = C;(R’) is the corresponding region in (q, p) coordinates, 
and where OR indicates the boundary of the region R. This holds 
for any region and its boundary. We will show that this implies 
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there is a function F(t,q',p'), which can be defined in terms of 
line integrals 


Fe dp) -Plt = | a Epai- f X` p;dq" (5.175) 
Va) i Y 5 


where 7 is a curve in phase space coordinates that begins at 
7'(0) = (4%, p>) and ends at (1) = (q’,p’), and y is its image 
under C;. 

Let 


Gi(7') = / So pidg' — J X vida", (5.176) 
y=) FG ye ig 
and let 7, and y4 be two paths with the same endpoints. Then 
G ~Girl) = f pdi- Y pdd" 
< À OR >, OR’ 2 
=0. 


(5.177) 


So the value of G;(7’) depends only on the endpoints of 7. 
Let 


Giap (d P) = Gel), (5.178) 


where y’ is any path from qo, po to q’, p’. Changing the initial 
point from qj ph to qi pi changes the value of G by a constant 


Gi qp, (TP) — Giap (CP) = Gea 04 (90 Po)- (5.179) 
So we can define F so that 
Giap (dip) = F(t. dp’) — F(t, do, Po), (5.180) 


demonstrating equation (5.175). 

The phase-space point (q, p) in unprimed variables corresponds 
to (q’,p’) in primed variables, at an arbitrary time t. Both p and q 
are determined given q’ and p’. In general, given any two of these 
four quantities we can solve for the other two. If we can solve for 
the momenta in terms of the positions we get a particular class of 
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generating functions.!® We introduce the functions 


P= fplt:q d") 
P = fp (tgd) (5.181) 


that solve the transformation equations (t, q, p) = C(t,q', p") for 
the momenta in terms of the coordinates at a specified time. With 
these we introduce a function F(t, q,q') such that 


Fi(t, 4, 7) = F(t,q, Filta) )): (5.182) 


The function F} has the same value as F but has different argu- 
ments. We will show that this F; is in fact the generating function 
for canonical transformations introduced in section 5.6. Let’s be 
explicit about the definition of F} in terms of a line integral 


F(t, q, 7) B F(t, qdo, q) 


aq 
=f tadda- fylta ddd): (5.183) 
q 


/ 
0:40 


The two line integrals can be combined into this one because they 
are both expressed as integrals along a curve in (q, q’). 

We can use the path independence of Fy to compute the par- 
tial derivatives of Fı with respect to particular components and 


19Point transformations are not in this class: we cannot solve for the momenta 
in terms of the positions for point transformations, because for a point trans- 
formation the primed and unprimed coordinates can be deduced from each 
other, so there is not enough information in the coordinates to deduce the 
momenta. 
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consequently derive the generating function relations for the mo- 
menta.” So we conclude that 


(aF (t, q, q'))i = fplt sg) (5.184) 
and 
(O2F i (t, q, q'))i T= — fp; (t, q, q’). (5.185) 


These are just the configuration and momentum parts of the gen- 
erating function relations for canonical transformation. So start- 
ing with a canonical transformation, we can find a generating 
function that gives the coordinate-momentum part of the trans- 
formation through its derivatives. 

Starting from a general canonical transformation, we have con- 
structed an F generating function from which the canonical trans- 


20Let F be defined as the path-independent line integral 
Fe= Í Y fila)de' + F(20) 
LQ i 


then 


The partial derivatives of F do not depend on the constant point xo or the 
path from zo to x, so we can choose a path that is convenient for evaluating 
the partial derivative. Let 


H(x)(Aa*) = F(2°,...,0° + Azt, ..., £t) — F(2°,...,2',...,2"7"). 
The partial derivative of F with respect to the itè component of F is 
ð: F (2) = D(H(2))(0). 

The function H is defined by the line integral 


gesti F Ag panl 
Hoa) = f E Hadr 
z0 ,... 2i, n1 j 
ot EAn eg oT 
=f filo)dn', 
29,,..,0%,...,2%7—-1 


where the second line follows because the line integral is along the coordinate 
direction x’. This is now an ordinary integral so 


O:F (x) = fi(z). 
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formation may be rederived. So, we expect there is a generating 
function for every canonical transformation.?! 


Generating functions of type F> 

Point transformations were excluded from the previous argument 
because we could not deduce the momenta from the coordinates. 
However, a similar derivation allows us to make a generating func- 
tion for this case. The integral invariants give us an equality of 
area integrals. There are other ways of writing the equality of 
areas relation (5.83) as a line integral. We can also write 


pidq' = -$ qidp". (5.186) 


The minus sign arises because by flipping the axes we are travers- 
ing the area in the opposite sense. Repeating the argument just 
given, we can define a function 


Ftd, p) -P'a = f se pais f X dap" (5.187) 
FeO) i Yi 


that is independent of the path y’. If we can solve for q’ and p in 
terms of q and p’ we can define the functions 


q =? fot q, p’) 


p= fp(t,q, p") (5.188) 
and define 
Fo(t,q, p) = F(t, fot g p) p). (5.189) 


Then the canonical transformation is given as partial derivatives 
of Fo: 


(ai Fo(t, q,P'))i = fota, p’) (5.190) 
and 
(02 F x(t, q, p") = folt,g p). (5.191) 


21There may be some singular cases and topological problems that prevent 
this from being rigorously true. 
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Relationship between F; and F> 

For canonical transformations that can be described by both an Fi 
and an Fə there must be a relation between them. The alternate 
line integral expressions for the area integral are related. Consider 
the difference 


(F'(t, qp") Z F'(t, qo» Po)) 7 (F(t, qp’) a F(t, qo» Po)) 


-| S piad’ + | X ddp" 
oM SAT 


= X (Pil — X (o)l). (5.192) 
i i 
The functions F and F’ are related by an integrated term 
F'tt,¢.p') — F(t,d, p) = pd, (5.193) 
as are Ff, and F> 
F(t, q, p") = F(t, q, q’) = pd. (5.194) 


The generating functions F} and F> are related by a Legendre 
transform: 


p = —vF\(t, qq) (5.195) 
Pd=-Filtaq)+ Rar) (5.196) 
q = OoFa(t,q,p'). (5.197) 


We have passive variables q and t: 


—O F(t, q, 7) + OF a(t, q, p') =0 (5.198) 
—OoF i (t, q, 7) et Oo Fo (t, q, p') = 0. (5.199) 


But p = 0 Fı(t,q,q') from the first transformation, so 
p= ð Fo(t,q, p’). (5.200) 


Furthermore, since H'(t,q',p') — H(t,q, p) = oF (t, q, q) we can 
conclude that: 


H'(t,q',p') — H(t, q, p) = 3oF2(t, q, p') (5.201) 
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5.6.3 Classes of Generating Functions 


In summary, we have used F} type generating functions to con- 
struct canonical transformations: 


p= OF (t, q, 7) (5.202) 
p = —02F\(t, q, 7) (5.203) 
H'(t, q, p) — H(t, q,p) = oF (t, q, 7): (5.204) 


We can also represent canonical transformations with generating 
functions of the form F)(t,q,p’), where the third argument of F> 
is the momentum in the primed system.?? 


p = ô Fo(t,q, p") (5.205) 
q = Fit, ¢,0’) (5.206) 
H'(t, q',p') +2 H(t, q, p) = Oo F(t, a2’) (5.207) 


As in the F} case to put the transformation in explicit form re- 
quires that appropriate inverse functions be constructed to allow 
the solution of the equations. 

Similarly, we can construct two other forms for generating func- 
tions, named mnemonically enough F} and F4: 


q= —0 F3(t, p, q') (5.208) 
p = —0F;(t, p,q’) (5.209) 
H'(t, d'p") = H(t, q, p) = OoF3(t, p, 7) (5.210) 


and 


22The various generating functions are traditionally known by the names: F, 
Fo, F3, and Fy. Please don’t blame us. 
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q = —O:Fi(t, p, p") (5.211) 
q = 02F,(t, p, p") (5.212) 
H'(t, q, p) => H(t, q,p) = DoF s(t, p, p") (5.213) 


In every case, if the generating function does not depend explic- 
itly on time then the Hamiltonians are obtained from each other 
purely by composition with the appropriate canonical transforma- 
tion. If the generating function depends on time, then there are 
additional terms. 

The generating functions presented treat the coordinates and 
momenta collectively. One could define more complicated gen- 
erating functions for which the transformation of each degree of 
freedom is specified by generating functions of different types. 


Generating functions in extended phase space 

We can represent canonical transformations with mixed variable 
generating functions. We can extend these to represent trans- 
formations in the extended phase space. Let Fə be a generating 
function with arguments (t, q, p'). Then, the corresponding F$ in 
the extended phase space can be taken to be 


Fy (T;q,t; p',p,) = tp, + Folt, q, p). (5.214) 


The relations between the coordinates and the momenta are the 
same as before. We also have 


pi = (OFf )n(T; q, t; p', pi) = pi + OoFo(t, q, p) 
t! = (OoF5)"(734, t; p', ph) = t. (5.215) 


The first equation gives the relationship between the original 
Hamiltonians: 


H'(t,q', p") = H(t, q, p) + 3oF2(t,q, p), (5.216) 


as required. We know that time-independent canonical transfor- 
mations have symplectic qp part. The generating function rep- 
resentation of a time dependent transformation does not depend 
on the independent variable in the extended phase space. So, in 
extended phase space the qp part of the transformation, which 
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includes the time and the momentum conjugate to time, is sym- 
plectic. 
5.6.4 Point Transformations 


Point transformations can be represented in terms of a generating 
function of type Fh. Equations (5.6), which define a canonical 
point transformation derived from a coordinate transformation 
F, are: 


(tg, p) =C dp) = (t, F(t, q'), (OLF (t, qt). (5.217) 


Let S be the inverse transformation of F with respect to the 
second argument 


q = S(t,q), (5.218) 


so that q! = S(t, F(t,q')). The momentum transformation that 
accompanies this coordinate transformation is 


p = pl S(t, q))7t. (5.219) 


We can find the generating function Fə that gives this transfor- 
mation by integrating equation (5.206) to get 


Fo(t,q, p") = p'S(t, q) + p(t, q). (5.220) 
Substituting this into equation (5.205) we get 
p = p'® S(t, q) + A9(t, q). (5.221) 


We do not need the freedom provided by ¢ so we can set it equal 
to zero: 


Fo(t,q, p") = p'S(t,q), (5.222) 
with 
p = p'® S(t, q). (5.223) 


So this F> gives the canonical transformation of equations (5.218) 
and (5.219). 

The canonical transformation for the coordinate transformation 
S is the inverse of the canonical transformation for F. By design 
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F and S are inverses on the coordinate arguments. The identity 
function is q' = I(q') = S(t, F(t, q’)). Differentiating yields 


1=9,S(t, F(t, q) F(t, q’), (5.224) 
OF (t,q’) = (S(t, F(t, g). (5.225) 


Using this, the relation between the momenta (5.223) is 
pap (rE; (5.226) 


showing that Fə gives a point transformation equivalent to the 
point transformation (5.217) 
So from this other point of view we see that the point transfor- 
mation is canonical. 
The Fı that corresponds to the F» for a point transformation 
is: 
Fi(t,q,q) = Fo(t,a,p') — p'd' 
= p'S(t,q) — p'd' 
=0. (5.227) 


Polar and rectangular coordinates 
A commonly required point transformation is the transition be- 
tween polar coordinates and rectangular coordinates: 


x =rcosé (5.228) 


y = rsinð. 


Using the formula for the generating function of a point transfor- 
mation just derived: 


on 


rsin 6 


Fo(t;r, 0; Pr, Py) = [Px Py] ( (5.229) 


So the full transformation is derived: 


(x,y) = O2F2(t; r, 0; px, Py) 
= (rcos6,rsin 0) 


[Pr po] = AF 2(t; r, 0; pe, Py) 
= [pz cos 0 + py sin 0, —pzr sin 0 + pyr cos 0] . (5.230) 
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We can isolate the rectangular coordinates to one side of the trans- 
formation and the polar coordinates to the other 


1 
Pr = z (paz + pyy) 
PO = —PryY + Py®. (5.231) 


So, interpreted in terms of Newtonian vectors, pp = f- p is the 
radial component of the linear momentum and pg = ||P x pl| is the 
magnitude of the angular momentum. Since the point transfor- 
mation is time independent the Hamiltonian transforms by com- 
position. 


Rotating coordinates 

A useful time-dependent point transformation is the transition to 
a rotating coordinate system. This is most easily accomplished in 
polar coordinates. Here we have 


U 
r =r 


6’ =0-— Nt, (5.232) 


where Q is the angular velocity of the moving frame of reference. 
The generating function is 


Hento g: (5.233) 


This yields the transformation equations 


r =r 

= 0- Ot 

Pr = ph 

Po = Po; (5.234) 


which show that the momenta are the same in both coordinate 
systems. However, here the Hamiltonian is not a simple composi- 
tion: 


H' (t; r’, 0; pl, po) = H(t; r',0' + Ot; pp, po) — 6. (5.235) 


The Hamiltonians differ by the derivative of the generating func- 
tion with respect to the time argument. In transforming to a ro- 
tating frame the values of the Hamiltonians differ by the product 
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of the angular momentum and the angular velocity of the frame. 
Notice that this addition to the Hamiltonian is the same as was 
found earlier (5.57). 


Exercise 5.14: Rotating coordinates in extended phase space 


In the extended phase space the time is one of the coordinates. Carry out 
the transformation to rotating coordinates using an F»-type generating 
function in the extended phase space. Compare the Hamiltonian, ob- 
tained by composition with the transformation, to Hamiltonian (5.235). 


Two-body problem 

In this example we illustrate how canonical transformations can 
be used to eliminate some of the degrees of freedom, leaving an 
essential problem with fewer degrees of freedom. 

Suppose only certain combinations of the coordinates appear in 
the Hamiltonian. We make a canonical transformation to a new 
set of phase-space coordinates such that these combinations of 
the old phase space coordinates are some of the new phase space 
coordinates. We choose other independent combinations of the 
coordinates to complete the set. The advantage is that these other 
independent coordinates do not appear in the new Hamiltonian, 
so the momenta conjugate to them are conserved quantities. 

Let’s see how this idea lets us reduce the problem of two gravi- 
tating bodies to the simpler problem of the relative motion of the 
two bodies, and in the process discover that the momentum of the 
center of mass is conserved. 

Consider the motion of two masses mı and m2, subject only to 
a mutual gravitational attraction described by the potential V(r). 
This problem has six degrees of freedom. The rectangular coor- 
dinates of the particles are x; and x2, with conjugate momenta 
pı and po. Each of these is a structure of the three rectangular 
components. The distance between the particles is r = ||x1 — zəl]. 
The Hamiltonian for the two-body problem is: 


H(t; £1, £2; p1, p2) = —— + —= + V(r). (5.236) 


We do not need to further specify V at this point. 
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We note that the only linear combination of coordinates that 
appears in the Hamiltonian is £2 — £1. We choose new coordinates 
so that one tuple of the new coordinates is this combination 


T = T2 — T1 (5.237) 


and to complete the set of new coordinates we choose another 
tuple to be some independent linear combination 


X = azı + bzr2 (5.238) 


where a and b are to be determined. We can use an F> type 
generating function 


F(t; £1, £2; P, P) = (£2 = zı)p + (axı + bx2)P, (5.239) 


where p and P will be the new momenta conjugate to x and X, 
respectively. We deduce 


(x, X) = O2F2(t; £1, £2; p, P) = (£2 — z1, axı + bx2) 
[p1, p2] = O1 Fo(t; £1, £2; p, P) = [-p + aP, p + bP]. (5.240) 


We can solve these for the new momenta: 


pı + p2 
Paec 5.241 
a+b ( ) 
apz — bpı 
— 5.242 
p a+b ( ) 


The generating function is not time dependent so the new 
Hamiltonian is the old Hamiltonian composed with the trans- 
formation: 


—ptaP)? (p+ bP} 
Hee | ! 
(iz, Xsp,P) = PE oo + Ville) 
SSR fe 
=F 4 Š Avlel) 
b 
g (— z <) pP, (5.243) 
meg my 
with the definitions 
a re 
ee eee a (5.244) 
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= ae a, (5.245) 


We recognize u as the usual “reduced mass.” 

Notice that if the term proportional to pP were not present 
then the x and X degrees of freedom would not be coupled at all, 
and furthermore, the X part of the Hamiltonian would be just 
the Hamiltonian of a free particle which is trivial to solve. The 
condition that the “cross terms” disappear is 


——— =0, (5.246) 


which is satisfied by 


a = cm (5.247) 
b = cmo (5.248) 


for any c. For a transformation to be defined c must be non-zero. 
So with this choice the Hamiltonian becomes 


H'(t;x, X; p, P) = Hx(t, X, P) + Hz(t, £, p) (5.249) 
with 
p 
H(t, x, p) = Oi +V(r) (5.250) 
and 
P2 

Ax(t, X, P) = —. . 

x(t, X, P) = z7 (5.251) 


The reduced mass is the same as before, and now 


1 


Sa (5.252) 


Notice that without further specifying c the problem has been 
separated into the problem of determining the relative motion of 
the two masses, and the problem of the other degrees of freedom. 
We did not need to have a priori knowledge that the center of 
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mass might be important; and, in fact, only for particular choice 
of c = (mı + mz2)~! does X become the center of mass. 


Exercise 5.15: Jacobi coordinates 


Consider an n-body problem in which the potential energy is the sum 
of the potential energy of each pair of bodies considered separately, and 
this potential energy depends only the distance between these bodies. 
A Hamiltonian for this system is 


H=T+V (5.253) 

with 
n—1 pe 

T(t; To, Tl,- -+ n—1; P0; P1,- ,Pn—1) = 5 NT (5.254) 
i=0 ú 

and 

V(t; T0, L1,- , n—1; PO, P1,- Paza) = 5 fiC: = zll), (5.255) 
i<j 


where 2; is the tuple of rectangular coordinates for body i, and p; is the 
tuple of conjugate linear momenta for body i. 

The potential energy of the system depends only on the relative po- 
sitions of the bodies, so the relative motion decouples from the center 
of mass motion. There is more than one canonical transformation that 
accomplishes this decomposition of center of mass and relative motion 
in the n-body problem. 

We introduce a notation for the center of mass of the bodies with 
indices less than or equal i 


a Miti 
X= 2 jo , (5.256) 
Ni 


with n; = a Mi. 

a. Define one new coordinate to be the center of mass of the system. 
gy = Kai (5.257) 
and n — 1 other coordinates to be 

£; = £i — Xn-1, (5.258) 


for i > 0, the differences of the position of body i and the center of 
mass of the system. Find the associated canonical momenta using an 
Fə type generating function. Show that the potential energy can be 
written in terms of the coordinates for i > 0. Show that the kinetic 
energy is not in the form of a sum of squares of momenta divided by 
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mass constants. These phase-space coordinates are known as canonical 
heliocentric coordinates. 


b. The Jacobi coordinates isolate the center of mass motion, without 
spoiling the usual diagonal quadratic form of the kinetic energy. The 
Jacobi coordinates are defined by 


1 
Ti 


= Tti — Xi-1; (5.259) 


the difference of the position of body i and the center of mass of bodies 
with lower indices, and 


zh = Xni, (5.260) 


the center of mass of all the bodies. 

Complete the canonical transformation by finding the conjugate mo- 
menta using an Fə type generating function. Show that the kinetic 
energy can still be written in the form 


PER se piv o Lha- Po Pie Ph) = Pi 5 (5.261) 


i=0 7 


for some constants m/, and that the potential V can be written solely 
in terms of the Jacobi coordinates x, with indices i > 0. 


c. Are there any other canonical transformations that isolate the center 
of mass and leave the kinetic energy as a sum of squares of momenta? 


Epicyclic motion 

It is often useful to compose a sequence of canonical transforma- 
tions to make up the transformation we need for any particular 
mechanical problem. The transformations we have supplied are 
especially useful as components in these computations. 

We will illustrate the use of canonical transformations to learn 
about planar motion in a central field. The strategy will be to 
consider perturbations of circular motion in the central field. The 
analysis will proceed by transforming to a rotating coordinate sys- 
tem that rides on a circular reference orbit, and then to make ap- 
proximations that restrict the analysis to orbits that differ from 
the circular orbit only slightly. 

Recall that in rectangular coordinates we could easily write a 
Hamiltonian for the motion of a particle of mass m in a field 
defined by a potential energy that is only a function of the distance 
from the origin as follows: 


Pa + Dy 
2m 


+ V(V 2? +y?) (5.262) 


H(t; £, Y; Pr, Py) = 
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In this coordinate system Hamilton’s equations are easy, and they 
are exactly what is needed to develop trajectories by numerical 
integration, but the expressions are not very illuminating: 


Dr = ® (5.263) 
m 
Dy = Py (5.264) 
T 
Dp, = —DV(V x£? + MS (5.265) 
Vue+y 
Dpy = -DV (y1? + 2) — (5.266) 


OTP 


We can learn more by converting to polar coordinates centered 
on the source of our field. 


x=rcos¢ (5.267) 
ponding (5.268) 


This coordinate system explicitly incorporates the geometrical 
symmetry of the potential energy. Using the results of the previ- 
ous section we can write the new Hamiltonian as: 


rs p 
H'(t; r, 0; Pr, Po) = ELA V(r) (5.269) 


We can now write Hamilton’s equations in these new coordinates, 
and they are much more illuminating than the equations expressed 
in rectangular coordinates: 


Dr=*2 (5.270) 
m 
Po 

Hye . 

eee (5.271) 
p? 

Dp, = —*, — DV (r) (5.272) 
mr 

Dpg = (5.273) 


We see that the angular momentum pg is conserved, and we 
are free to choose its constant value, so Dø depends only on r. 
We also see that we can establish a circular orbit at any radius 
Ro: we choose pg = Po, SO that P30/ (mR) — DV (Ro) = 0. This 
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will ensure that Dp, = 0, and thus Dr = 0. The (square of the) 
angular velocity of this circular orbit is 
g2 — PV (Ho) (5.274) 
mRo 
It is instructive to consider how orbits that are close to the circular 
orbit differ from the circular orbit. This is best done in a frame 
where a body moving in the circular orbit is a stationary point at 
the origin. We can do this by converting to coordinates that are 
rotating with the circular orbit and centered on the orbiting body. 
We will do this in three stages. First we will transform to a polar 
coordinate system that is rotating at angular velocity Q. Then 
we will return to rectangular coordinates, and finally, we will shift 
the coordinates so the origin is on the reference circular orbit. 
We start by examining the system in rotating polar coordinates. 
This is a time-varying coordinate transformation: 


r =r (5.275) 
f =¢ġ-Nt (5.276) 
Pr = Pr (5.277) 
Diy = Po (5.278) 


Using the formulas developed in the last section we can now write 
the new Hamiltonian directly: 


12 12 
p 
H" (t; r’, 05 pr Pg) = on + ae a + V(r’) = pi (5.279) 


We see that H” is not time dependent, and therefore it is con- 
served, but it is not energy. Energy is not conserved in the moving 
coordinate system, but what is conserved here is a new quantity 
which combines the energy with the product of the angular mo- 
mentum of the particle in the new frame and the angular velocity 
of the frame. We will want to keep track of this term. 

Next, we return to rectangular coordinates, but they are rotat- 
ing with the reference circular orbit: 


a’ = r' cos ġ' (5.280) 
y = r' sin o (5.281) 


/ 


P 
p = ph cos ¢’ — a sind’ (5.282) 
r 
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1 
P 
py = p sin g + 2 cos ¢’. (5.283) 
r 
The Hamiltonian is 


H” (t; x", y'; Pr Py) 
pe + py pE agii ae 
=F UY Pe py) + VI gl + y’). (5.284) 
With one more quick manipulation we shift the coordinate sys- 
tem so that the origin is out on our circular orbit. We define 
new rectangular coordinates € and ņ with the following simple 
canonical transformation of coordinates and momenta: 


a Rs (5.285) 
n= y (5.286) 
ee (5.287) 
Dn = Py- (5.288) 


In this final coordinate system the Hamiltonian is 


Wa Pe +p 
H"! (t; €, n; Pe, Pn) = a Q(npe — (E + Ro)pn) 
+ V(V(E+ Ro)? +77), (5.289) 


and Hamilton’s equations are uselessly complicated, but the next 
step is to consider only trajectories for which the coordinates € 
and 7 are small compared with Ro. Under this assumption we 
will be able to construct approximate equations of motion for these 
trajectories that are linear in the coordinates, thus yielding simple 
analyzable motion. Note that up until here, we have made no 
approximations. The equations above are perfectly accurate for 
any trajectories in a central field. 

The idea is to expand the potential-energy term in the Hamilto- 
nian as a series and to discard any term higher than second order 
in the coordinates, thus giving us first-order accurate Hamilton’s 
equations: 


U(E, n) =V(V (E + Ro)? +n?) (5.290) 


2 
S EE EENE (5.291) 
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= V (Ro) + DV(Ro) E+ FE 
+ DPV (Ro) È free, (5.292) 


So the (negated) generalized forces are: 


OoU (E,n) = DV (Ro) + D?V (Roë +- (5.293) 
LU (E, n) = DV (Ro) TA (5.294) 


With this expansion we obtain the linearized Hamilton’s equa- 
tions: 


pé= +n (5.295) 
m 

Dn= = =(6 4 Rp) (5.296) 

Dpg = —DV (Ro) — D?>V(Ro)E +--+ + Qpr (5.297) 

Dp, = —DV(Ro) = +++: — Qpe. (5.298) 


Of course, once we have linear equations we know how to solve 
them exactly. Since the linearized Hamiltonian is conserved we 
cannot get exponential expansion or collapse. So the possible 
solutions are quite limited. It is instructive to convert these equa- 
tions into a second-order system. We use 0? = DV(Ro)/(mRo) 
to eliminate the DV terms: 


D?V (Ro) 
m 


D?n + 2QDE = 0. (5.300) 


D*é — 20.Dy = (Q = Jé (5.299) 


Combining these we find 


D3é + w*DE =0 (5.301) 
where 

D? 
w? = 307 + an (5.302) 


Thus we have a simple harmonic oscillator with frequency w as 
one of the components of the solution. The general solution has 
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three parts 


eB]-m( cam 
+ £o ee (5.304) 
ek o saw 

where 

pa DEV U, (5.306) 


ANM 


The constants no, 0, Co, and yo are determined by the initial 
conditions. If Co = 0 the particle of interest is on a circular trajec- 
tory, but not necessarily the same one as the reference trajectory. 
If Co = 0 and &) = 0 we have a “fellow traveler”, a particle in 
the same circular orbit as the reference orbit, but with different 
phase. If Co = 0 and 79 = 0 we have a particle in a circular orbit 
that is interior or exterior to the reference orbit and shearing away 
from the reference orbit. The shearing is due to the fact that the 
angular velocity for a circular orbit varies with the radius. The 
constant A gives the rate of shearing at each radius. If both no = 0 
and £o = 0 but Co 4 0 then we have “epicyclic motion”. A particle 
in a nearly circular orbit may be seen to move in an ellipse around 
the circular reference orbit. The ellipse will be elongated in the 
direction of circular motion by the factor 20Q/w and it will rotate 
in the direction opposite the direction of the circular motion. The 
initial phase of the epicycle is yo. Of course, any combination of 
these solutions may exist. 

The epicyclic frequency w and the shearing rate A are deter- 
mined by the force law (the radial derivative of the potential en- 
ergy). For a force law proportional to a power of the radius 


Fort? (5.307) 


the epicyclic frequency is related to the orbital frequency by 


=2,/1— (5.308) 


n 
4 
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and the shearing rate is 
eran (5.309) 


For a few particular integer force laws we see: 


we | Oli ei) Be ee | A 
A 1 1 3 5 
allaa lea lll a 
62) v3] v2] 1/0] 4+: 


We can get some insight into the kinds of orbits that are pro- 
duced by the epicyclic approximation by examining a few exam- 
ples. For some force laws we have integer ratios of epicyclic fre- 
quency to orbital frequency. In those cases we have closed orbits. 
For an inverse-square force law (n = 3) we get elliptical orbits 
with the center of the field at a focus of the ellipse. Figure 5.3 
shows how an approximation to such an orbit can be constructed 
by superposition of the motion on an elliptical epicycle with the 
motion of the same frequency on a circle. If the force is propor- 
tional to the radius (n = 0) we get a two-dimensional harmonic 
oscillator. Here the epicyclic frequency is twice the orbital fre- 
quency. Figure 5.4 shows how this yields elliptical orbits that are 
centered on the source of the central force. An orbit is closed 
when © is a rational fraction. If the force is proportional to the 
—3/4 power of the radius the epicyclic frequency is 3/2 the or- 
bital frequency. This yields a 3-lobed pattern that can be seen 
in figure 5.5. For other force laws the orbits predicted by this 
analysis are multi-lobed patterns produced by precessing approx- 
imate ellipses. Most of the cases have incommensurate epicyclic 
and orbital frequencies, leading to orbits that do not close in finite 
time. 

The epicyclic approximation gives a very good idea of what ac- 
tual orbits look like. Figure 5.6, drawn by numerical integration 
of the orbit produced by integrating the original rectangular equa- 
tions of motion for a particle in the field, shows the rosette-type 
picture characteristic of incommensurate epicyclic and orbital fre- 
quencies for an F = —r~? force law. 

We can directly compare a numerically integrated system with 
one of our epicyclic approximations. For example the result of 
numerically integrating our F œ r~?/4 system is very similar to 
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Figure 5.3 Epicyclic construction of an approximate orbit for F œ 
r?°. The large dotted circle is the reference circular orbit. The dot- 
ted ellipses are the epicycles. The epicycles are twice as long as they 
are wide. The solid ellipse is the approximate trajectory produced by a 
particle moving on the epicycles. The sense of orbital motion is counter- 
clockwise, and the epicycles are rotating clockwise. The arrows represent 
the increment of velocity contributed by the epicycle to the circular ref- 
erence orbit. 


Figure 5.4 Epicyclic construction of an approximate orbit for F œ r. 
The large dotted circle is the reference circular orbit. The small dotted 
circles are the epicycles. The solid ellipse is the approximate trajectory 
produced by a particle moving on the epicycles. The sense of orbital 
motion is counterclockwise, and the epicycles are rotating clockwise. The 
arrows represent the increment of velocity contributed by the epicycle 
to the circular reference orbit. 
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Figure 5.5 Epicyclic construction of an approximate orbit for F œ 
r—3/4, The large dotted circle is the reference circular orbit. The dotted 
ellipses are the epicycles. The epicycles are in a 4 : 3 ratio of length 
to width. The solid ellipse is the approximate trajectory produced by a 
particle moving on the epicycles. The sense of orbital motion is counter- 
clockwise, and the epicycles are rotating clockwise. The arrows represent 
the increment of velocity contributed by the epicycle to the circular ref- 
erence orbit. 


Figure 5.6 The numerically integrated orbit of a particle with a force 
law F œ r723, For this law the ratio of the epicyclic frequency to the 
orbital frequency is about .83666—close to 5/6, but not quite. This is 
manifest in the nearly 5-fold symmetry of the rosette-like shape and the 
fact that one must cross approximately six orbits to get from the inside 
to the outside of the rosette. 
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the picture we obtained by epicycles. (See figure 5.7 and compare 
it with figure 5.5.) 


Figure 5.7 The numerically integrated orbit of a particle with a force 
law F x r—%/4, For this law the ratio of the epicyclic frequency to the 
orbital frequency is exactly 3/2. This is manifest in the 3-fold symmetry 
of the rosette-like shape and the fact that one must cross two orbits to 
get from the inside to the outside of the rosette. 


Exercise 5.16: Collapsing orbits 


What exactly happens as the force law becomes more steep? Investigate 
this by sketching the contours of the Hamiltonian in r,p, space, for 
various values of the force-law exponent, n. For what values of n are 
there stable circular orbits? In the case that there are no stable circular 
orbits what happens to circular and other noncircular orbits? How are 
these results consistent with Liouville’s theorem and the non-existence 
of attractors in Hamiltonian systems. 


5.6.5 Classical “Gauge” Transformations 


The addition of a total time derivative to a Lagrangian leads to the 
same Lagrange equations. However, the two Lagrangians have dif- 
ferent momenta, and they lead to different Hamilton’s equations. 
Here, we find out how to represent the corresponding canonical 
transformation with a generating function. 

Let’s restate the result about total time derivatives and La- 
grangians from the first chapter. Consider some function G(t, q) 
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of time and coordinates. We have shown that if L and L’ are 
related by 


L'(t,q,4) = L(t,4,9) + OG (t, q) + AG(t, ad (5.310) 


then the Lagrange equations of motion are the same. The gener- 
alized coordinates used in the two Lagrangians are the same, but 
the momenta conjugate to the coordinates are different. In the 
usual way, define 


P(t,4,4) = ®L(t,¢, 4) (5.311) 
and 

P'(t, 4,4) = O2L'(t, q, å). (5.312) 
So we have 

P'(t,¢,4) =P(t.ad + AG(t,q). (5.313) 


Evaluated on a trajectory, we have 
p(t) = p(t) + AG(t, q(t). (5.314) 


This transformation is a special case of an F> type transformation. 
Let 


Fy(t,q,p') = qv’ — Git, 4), (5.315) 


then the associated transformation is 


q = Fr(t,q,p') =q (5.316) 

p = ôi Fo(t,q,p') = p — Gt, q) (5.317) 
H'(t,q',p') = H(t, q, p) + OF ot, q, p') 

= H(t,q, p) — OoG(t, q). (5.318) 


Explicitly, the new Hamiltonian is 
H'(t,q',p') = H(t, qd’, p — @G(t, q')) — Gt, 7), (5.319) 


where we have used the fact that q = q’. The transformation is 
interesting in that the coordinate transformation is the identity 
transformation, but the new and old momenta are not the same, 
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even in the case in which G has no explicit time dependence. 
Suppose we have a Hamiltonian of the form 
Pe 


+ V(z) (5.320) 


then the transformed Hamiltonian is 


H'(t,x', p") = (= C E + V(x") — G(t, x"). (5.321) 


2m 


We see that this transformation may be used to modify terms in 
the Hamiltonian that are linear in the momenta. Starting from H 
the transformation introduces linear momentum terms; starting 
from H’ the transformation eliminates the linear terms. 

We illustrate the use of this transformation with the driven 
pendulum. The Hamiltonian for the driven pendulum was derived 
automatically in section 3.1.1. We repeat the result here (cleaned 
up a bit) 


H(t, 0, po) 
2 
_ Po 
Soa glm cos 0 
+ gmys(t) — a sin 0Dys(t) — 5 (cos 6)? (Dys(t))?, (5.322) 


where ys is the drive function. The Hamiltonian is rather messy, 
and includes a term that is linear in the angular momentum with 
a coefficient that depends on both the angular coordinate and the 
time. Let’s see what happens if we apply our transformation to 
the problem to eliminate the linear term. We can identify the 
transformation function G by requiring that the linear term in 
momentum is killed: 


G(t, 0) = —ml cos 6Dy,(t). (5.323) 
The transformed momentum is 
Do = po + ml sin ODy,(t), (5.324) 


and the transformed Hamiltonian is 


(py)? 
2ml2 


H'(t,0, po) = — ml(g + D’ys) cos 0 


5.6.5 Classical “Gauge” Transformations 389 


m 
+ gmys(t) — z (ys(t))” (5.325) 
Dropping the last two terms, which do not affect the equations of 
motion, we find 


(pp)? 


Sa ml(g + Dy) cos 0. (5.326) 


H'(t,0, pg) = 


So we have found, by a straightforward canonical transformation, 
a Hamiltonian for the driven pendulum with the rather simple 
form of a pendulum with gravitational acceleration that is mod- 
ified by the acceleration of the pivot. It is, in fact, the Hamilto- 
nian that corresponds to the alternate form of the Lagrangian for 
the driven pendulum we found earlier by inspection (see equation 
1.120). Here the derivation is by a simple canonical transforma- 
tion, motivated by a desire to eliminate unwanted terms that are 
linear in the momentum. 


Exercise 5.17: Construction of generating functions 
Suppose that canonical transformations Ca and Cy are generated by Fi 
class generating functions Fia and Fp. 


a. Show that the generating function for the inverse transformation of 
Co is — Fia. 


b. Show that the generating function for the composition transforma- 
tion Ca o Cy is Fia + Fib, using the fact that the generating function 
does not depend on the intermediate point. 


Exercise 5.18: Linear canonical transformations 


We consider systems with two degrees of freedom, and transformations 
for which the Hamiltonian transforms by composition. 


a. Consider the linear canonical transformations that are generated by 
Fo(t; £1, £2; P1, P2) = pyar + pi bre + pacti + pda. 


Show that these transformations are just the point transformations, and 
that the corresponding F; is zero. 


b. Other linear canonical transformations can be generated by 
F(t; £1, £2; £1, £3) = vax, +x) brg + xer + vhdro. 


Surely we can make even more generators by constructing F3 and Fy 
class transformations analogously. Are all of the linear canonical trans- 
formations obtainable in this way? If not, show one that cannot be so 
generated. 
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c. Can all linear canonical transformations be generated by composi- 
tions of transformations generated by the functions shown in parts a 
and b above? 


d. How many independent parameters are necessary to specify all pos- 
sible linear canonical transformations for systems with two degrees of 
freedom? 


Exercise 5.19: Integral invariants 


Consider the linear canonical transformation for a system with two de- 
grees of freedom generated by the function: 


1 t / 1 / A 
Fy (t; £1, £2; £1, £2) = vax, + v1 bao + £3C£1 + xhdra, 


and the general parallelogram, with a vertex at the origin and with 
adjacent sides starting at the origin and extending to the phase-space 
points (Lia, %2a;Pia;P2a) and (1p, 225, Pib, Pap). 


a. Find the area of the given parallelogram, and find the area of the 
target parallelogram under the canonical transformation. Notice that 
the area of the parallelogram is not preserved. 


b. Find the areas of the projections of the given parallelogram, and the 
areas of the projections of the target under canonical transformation. 
Show that the sum of the areas of the projections on the action-like 
planes is preserved. 


Exercise 5.20: Standard map generating function 


Find a generating function for the standard map (see exercise 5.5). 


Exercise 5.21: An incorrect derivation 


The following is an incorrect derivation of the rules for the generating 
function. As you read it try to find the bug. Write an essay on this 
subject. What is actually the problem? 

Let L and L’ be the Lagrangians expressed in two coordinate systems 
for which the path is q and q’, respectively. We further assume that the 
value of L and L’ on the path differ by the time derivative of a function 
of the configuration and time evaluated on the path. This function 
can be written in terms of the path expressed in terms of both sets of 
coordinates. Consider the function F(t, q,q’), and its value on the path 


Fi (t) = Fi(t, q(t), q'(t)) at time t. The time derivative of F4 is 
DF, (t) = (AFi(ta(@), 4'(Q)Dalt) 


+ (32 F1)(t, at), (t) Da’ (t) 
+ Oo Fi (t, q(t), q'(t)). (5.327) 


The relation between the Lagrangians is therefore 


Lit, q, q) = Litt, d', 7) 
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= (LF) (t, q, qå + (32F1)(t, q, q J4 + Fi (t, q, q’). (5.328) 
Now rewrite the Lagrangians in terms of the Hamiltonians 
[pq — H(t, q, p)] — [p4 — H'(t,q',p')] (5.329) 


= OF; (t, q, qå + oF; (t, q, q)¢ R Fi (t, q, q’), 


where p is determined by t, q, and q and the Lagrangian L. Similar 
relations hold for the primed functions. Let’s collect terms 


0= [p — o Fı(t,q, q')]å 
—[p' + 82F; (t, q, 0) 
= H(t, q, p) Ga H'(t, qp) a OF, (t, q, q’). (5.330) 


If the relations (5.148-5.150) hold then each of these lines is inde- 
pendently zero, apparently verifying that the Lagrangians differ by a 
total time derivative. If this were true then the equations of motion 
would be preserved and the transformation would have been shown to 
be canonical.?? 


5.7 Time Evolution is Canonical 


In this section we demonstrate that time evolution generates a 
canonical transformation: if we consider all possible initial states 
of a Hamiltonian system, and we follow all of the trajectories for 
the same time interval, then the map from the initial state to the 
final state of each trajectory is a canonical transformation. 

We use time evolution to generate a transformation 


(t,q,p) = Ca (t, qp") (5.331) 


that is obtained in the following way. Let a(t) = (t, q(t), p(t)) bea 
solution of Hamilton’s equations. The transformation Ca satisfies 


Ca(o(t)) = o(t + A), (5.332) 


23Many texts further muddy the matter by introducing an unjustified indepen- 
dence argument here: they argue that because ġ and q’ are independent the 
relations (5.148-5.150) must hold. This is silly, because p and p’ are functions 
of g and q’, respectively, so there are implied dependencies of the velocities 
in many places, so it is unjustified to separately set pieces of this equation to 
zero. However, notwithstanding this problem, the derivation of the fact that 
the transformation is canonical is fallacious. 
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or, equivalently, 


Notice that Ca changes the time component. This is the first 
transformation of this kind that we have considered.?4 

Given a state (t',q', p’) we find the phase space path o emanat- 
ing from this state as an initial condition, satisfying 


q = at’) 
ca (5.334) 


The value (t, q, p) of Ca(t’, q’, p’) is then (t/+A, q(t/+A), p(t’/+A)). 

Time evolution is canonical if the transformation Ca is symplec- 
tic and if the Hamiltonian transforms in an appropriate manner. 
The transformation Ca is symplectic if the bilinear antisymmet- 
ric form w is invariant (see equation 5.73) for a general pair of 
linearized state variations with zero time component. 

Let ¢’ be an increment with zero time component of the state 
(t’,q',p'). The linearized increment in the value of Ca(t’,q’, p’) is 
C = DCa(t',d,p')¢’. The image of the increment is obtained by 
multiplying the increment by the derivative of the transformation. 
On the other hand, the transformation is obtained by time evolu- 
tion, so the image of the increment can also be found by the time 
evolution of the linearized variational system. Let 


C(t) = (0, Galt), (E) 
C'(t) = (0, 6, (4), G4) (5.335) 
be variations of the state path o(t) = (t, q(t), p(t)), then 


(t+ A) = DCalt, a(t) 


a(t), p(t) )¢(t) 
C(t T A) = DCa (t, q(t), p t Ç 


E (E). (5.336) 


The symplectic requirement is 


w(C(t), C'E) = w(t + A), C+ A)). (5.337) 


24Our theorems about which transformations are canonical are still valid, be- 
cause they only required that the derivative of the independent variable be 1. 
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This must be true for arbitrary A, so it is satisfied if the following 
quantity is constant: 


EOC 
PONCE — PERCH 
= GOG — GOGO. (5.338) 


We compute the derivative: 


DA(t) = DGE) lt) + GE) Dalt) 
— D(t) (t) — Colt) D(t). (5.339) 


Using Hamilton’s equations the variations satisfy 


A(t) 


D&(t) = 12H (t, 4t), P(t))Cq(t) 
+ 8202H (t, q(t), p(t)) p(t), 
DQ (t) = -3181 H (t, Ft), P) Gl) 
— 029, H(t, q(t), D(t))Op(t). (5.340) 


Substituting these in DA and collecting terms we find?” 
DA(t) = 0. (5.341) 


We conclude that time evolution generates a phase space trans- 
formation with symplectic derivative. 

To make a canonical transformation we must specify how the 
Hamiltonian transforms. The same Hamiltonian describes the 
evolution of a state and a time-advanced state because the lat- 
ter is just another state. Thus the transformed Hamiltonian is 
the same as the original Hamiltonian. 


Liouville’s theorem, again 

We deduced that volumes in phase space are preserved by time 
evolution by showing that the divergence of the phase flow is zero, 
using the equations of motion (see section 3.8). We can also use 
the fact that volumes in phase space are preserved by the evolution 
using the fact that time evolution is a canonical transformation. 


25 Partial derivatives of structured arguments do not generally commute, so 
this deduction is not as simple as it may appear. It is helpful to introduce 
component indices and consider the equation componentwise. 
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We have shown that phase space volume is preserved for sym- 
plectic transformations. Now we have shown that the transforma- 
tion generated by time evolution is a symplectic transformation. 
Therefore, the transformation generated by time-evolution pre- 
serves phase space volume. This is an alternate proof of Liouville’s 
theorem. 


Another time-evolution transformation 
There is another canonical transformation that can be constructed 
from time evolution. We define the transformation Cx such that 


Ch SOROS 5 (5.342) 


where Sa(a,b,c) = (a + A, b,c) shifts the time of a phase-space 
state.2° More explicitly, given a state (t, q’, p’), we evolve the state 
that is obtained by subtracting A from t; that is, we take the 
state (t — A,q’,p’) as an initial state for evolution by Hamilton’s 
equations. The state path o satisfies 


alt- A) = (t— A, q(t — A), p(t — A)) 
= (t—A,q',p’). (5.343) 


The output of the transformation is the state 


(t,q, p) = a(t) = (t, a(t), ple). (5.344) 
The transformation satisfies 
(t, a(t), D(t)) = CA (t, a(t — A), p(t — A)). (5.345) 


The arguments of C/, are not a consistent phase-space state, the 
time argument must be decremented by A, and then the transfor- 
mation is made by evolution of this state. 

Why is this a good idea? Our usual canonical transforma- 
tions do not change the time component. This modified time- 
evolution transformation is thus of the form discussed previously. 


?6The transformation Sa is an identity on the gp components, so it is symplec- 
tic. Although it adjusts the time, it is not a time-dependent transformation 
in that the qp components do not depend upon the time. Thus, if we adjust 
the Hamiltonian by composition with Sa we have a canonical transformation. 
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The resulting time-evolution transformation is canonical, and in 
the usual form: 


(t q, p) = CA (t, g, p"). (5.346) 


This transformation can also be extended to be a canonical 
transformation, with an appropriate adjustment of the Hamil- 
tonian. The Hamiltonian H/, that gives the correct Hamilton’s 
equations at the transformed phase space point is the original 
Hamiltonian composed with a function that decrements the inde- 
pendent variable by A: 


HA (t,q,p) = H(t — A,q,p), (5.347) 
or 
Hy = Ho Sa. (5.348) 


Notice that if H is time independent then H4 = H. 

Let us assume we have a procedure ((C delta-t) state) that 
implements a time-evolution transformation of the state state 
with time interval delta-t. 

We can get a procedure ((Cp delta-t) state) that imple- 
ments C% from the ((C delta-t) state) that implements CA us- 
ing the procedure 


(define ((C->Cp C) delta-t) 
(compose (C delta-t) (shift-t (- delta-t)))) 


where shift-t implements Sa: 


(define ((shift-t delta-t) state) 
(up 
(+ (time state) delta-t) 
(coordinate state) 
(momentum state) )) 


To complete the canonical transformation we have a procedure 
that transforms the Hamiltonian 


(define ((H->Hp delta-t) H) 
(compose H (shift-t (- delta-t)))) 


So both C and C’ can be used to make canonical transformations 
by specifying how the old and new Hamiltonians are related. For 
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Ca the Hamiltonian is unchanged. For Ch the Hamiltonian is 
time-shifted . 


Exercise 5.22: Verification 


The condition (5.19) that Hamilton’s equations are preserved for Ca is 
D,H oCa = DCA D, Hh, 


and the condition (5.19) that Hamilton’s equations are preserved for CA 
is 


D.H oCh = DC. Del. 
Verify that these conditions are satisfied. 


Exercise 5.23: Driven harmonic oscillator 


We can use the simple driven harmonic oscillator to illustrate that time 
evolution yields a symplectic transformation which can be extended to 
be canonical in two ways. We use the driven harmonic oscillator because 
its solution can be compactly expressed in explicit form. 

Suppose that we have a harmonic oscillator with natural frequency 
wo driven by a periodic sinusoidal drive of frequency w and amplitude 
a. The Hamiltonian we will consider is 


H(t,q,p) = 3p" + 4wa? — aq cos wt. 
The general solution for a given initial state (to, go, po) evolved for a time 
A is 
| q(to + A) | 
p(to + A)/wo 
_ coswoA sin en qo — a’ cos wto 
~ L=sinwoA coswA |] | (1/wo)(po + a'w sin wto) 
4 a’ cos w(to + A) | 
—a' (w/wọ)sinw(to + A) 

where a! = a/ (w8 — w?). 
a. Fill in the details of the procedure 


(define (((C alpha omega omega0) delta-t) state) 
.) 


that implements the time-evolution transformation of the driven har- 
monic oscillator. 


b. In terms of C the general solution emanating from a given state is 


(define (((solution alpha omega omega0) state0) t) 
(((C alpha omega omega0) (- t (time state0))) state0)) 
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Check that the implementation of C is correct by using it to construct the 
solution and verifying that the solution satisfies Hamilton’s equations. 
Further check the solution by comparing to numerical integration. 


c. We know that for any phase space state function F the rate of change 
of that function along a solution path ø is: 


D(Foo)=OFoo0+{F,H}oa 


Show, by writing a short program to test it, that this is true of the 
function implemented by (C delta) for the driven oscillator. Why is 
this interesting? 


d. Verify that both C and Cp are symplectic using symplectic?. 


e. Use the procedure canonical? to verify that both C and Cp are canon- 
ical with the appropriate transformed Hamiltonian. 


5.7.1 Another View of Time Evolution 


We can also show that time evolution generates canonical trans- 
formations using the Poincaré-Cartan integral invariant. 

Consider a two-dimensional region of phase space coordinates, 
R’, at some particular time t (see figure 5.8). Let R be the image 
of this region at time t under time evolution for a time interval 
of A. The time evolution is governed by a Hamiltonian H. Let 
>>, Ai be the sum of the oriented areas of the projections of R 
onto the fundamental canonical planes.?’ Similarly, let 5; A; be 
the sum of oriented projected areas for R’. We will show that 
5X; Ai = 32; Aj, and thus the Poincaré integral invariant is pre- 
served by time evolution. By showing that the Poincaré integral 
invariant is preserved we will have shown that the gp part of the 
transformation generated by time evolution is symplectic. From 
this we can construct canonical transformations from time evolu- 
tion as before. 

In the extended phase space we see that the evolution sweeps 
out a cylindrical volume with endcaps the regions R’ and R, each 
at a fixed time. Let R” be the two-dimensional region swept out 
by the trajectories that map the boundary of region R’ to the 


27By Stokes’ theorem we may compute the area of a region by a line integral 
around the boundary of the region. We define the positive sense of the area 
to be the area enclosed by a curve that is traversed in a counterclockwise 
direction, when drawn on a plane with the coordinate on the abscissa and the 
momentum on the ordinate. 
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Time 


Figure 5.8 All points in some two-dimensional region R’ in phase 
space at time t’ are evolved for some time interval A. At the time t 
the set of points define the two-dimensional region R. For example, the 
state labelled by the phase space coordinates (t’,q’,p’) evolves to the 
state labelled by the coordinates (t, q, p). 


boundary of region R. The regions R, R’, and R” together form 
the boundary of a volume of phase state space. 

The Poincaré-Cartan integral invariant on the whole boundary 
is zero.2° Thus 


yA SoA! cal =0, (5.349) 
i=0 i=0 i=0 


?8We can see this is the following way. Let y be any closed curve in the 
boundary. This curve divides the boundary into two regions. By Stokes’ 
theorem the integral invariant over both of these pieces can be written as a 
line integral along this boundary, but they have opposite signs, because y is 
traversed in opposite directions to keep the surface on the left. So we conclude 
that the integral invariant over the entire surface is zero. 
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where the n index indicates the tT canonical plane. The second 
term is negative, because in the extended phase space we take the 
area to be positive if the normal to the surface is outward pointing. 

We will show that the Poincaré-Cartan integral invariant for a 
region of phase space that is generated by time evolution is zero: 


X Al =0. (5.350) 
i=0 

This will allow us to conclude 

X Ai — 55 A, = 0. (5.351) 
i=0 i=0 


The areas of the projection of R and R’ on the tT plane are zero 
because R and R’ are at constant times, so for these regions the 
Poincaré-Cartan integral invariant is the same as the Poincaré 
integral invariant. Thus 


n-1 n-1 
A= A. (5.352) 


We are left with showing that the Poincaré-Cartan integral in- 
variant for the region R” is zero. This will be zero if the contri- 
bution from any small piece of R” is zero. We will show this by 
showing that the w form for a small parallelogram in this region 
is zero. Let (0;q,t;p,T) be a vertex of this parallelogram. The 
parallelogram is specified by two edges ¢; and Cj emanating from 
this vertex with components (0; Aq, At; Ap, AT). For edge &ı of 
the parallelogram we take a constant time phase space increment 
with length Aq and Ap in the q and p directions. The first order 
change in the Hamiltonian that corresponds to these changes is 


AH = ð H(t, q, p)Aq + 02H (t, q, p)Ap (5.353) 


for constant time At = 0. The increment AT is the negative of 
AH. So the extended phase space increment is 


C1 = (0; Aq, 0; Ap, -0, H (t, q, p)Aq — 02H (t, q, p) Ap). (5.354) 
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The edge Cə is obtained by time evolution of the vertex for a time 
interval At. Using Hamilton’s equations we obtain 


C2 = (0; Dq(t)At, At; Dp(t)At, DT (t) At) (5.355) 
= (0; 02H (t, q, p)At, At; —ð H(t, q, p)At, —Oo H(t, q, p) At). 


The w form applied to these incremental states that form the edges 
of this parallelogram gives the area of the parallelogram: 


w(¢1, C2) 
= Q(G1) P(C2) — P(G1)Q(C2) 
= (Aq, 0) 
‘(—O H(t, q, p) At, —OoH (t, q, p)At) 
— (Ap, -0, H(t, q, p)Aq — 02H (t, q, p) Ap) 
- (02H (t, q, p)At, At) 
= 0. (5.356) 


So we may conclude that the integral of this expression over the 
entire surface of the tube of trajectories is also zero. Thus the 
Poincaré-Cartan integral invariant is zero for any region that is 
generated by time evolution. 

Having proven that the trajectory tube provides no contribu- 
tion, we have shown that the Poincaré integral invariant of the two 
endcaps is the same. This proves that time evolution generates a 
symplectic gp transformation. 


Area preservation of surfaces of section 

We can use the Poincaré-Cartan invariant to prove that for au- 
tonomous two degree of freedom systems surfaces of section (con- 
structed appropriately) preserve area. 

To show this we consider a surface of section for one coordinate 
(say q2) equal to zero, and we construct the section by accumulat- 
ing the (q1,p1) pairs. We assume that all initial conditions have 
the same energy. We compute the sum of the areas of canonical 
projections in the extended phase space again. Because all initial 
conditions have the same q2 = 0 there is no area on the (q2, p2) 
plane and because all the trajectories have the same value of the 
Hamiltonian the area of the projection on the (t, T) plane is also 
zero. So the sum of areas of the projections is just the area of the 
region on the surface of section. Now let each point on the surface 
of section evolve to the next section crossing. For each point on 
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the section this may take a different amount of time. Compute the 
sum of the areas again for the mapped region. Again, all points 
of the mapped region have the same q2 so the area on the (q2, p2) 
plane is zero, and they continue to have the same energy so the 
area on the (t, T) plane is zero. So the area of the mapped re- 
gion is again just the area on the surface of section, the (q1, p1) 
plane. Time evolution preserves the sum of areas, so the area on 
the surface of section is the same as the mapped area. 

So surfaces of section preserve area provided that the section 
points are entirely on a canonical plane. For example, for the 
Hénon-Heiles surfaces of section we plotted py versus y when x = 0 
with py > 0. So for all section points the x coordinate has the 
fixed value 0, the trajectories all have the same energy, and the 
points accumulated are entirely in the (py, y) canonical plane. So 
the Hénon-Heiles surfaces of section preserve area. 


5.7.2 Yet Another View of Time Evolution 


We can show that time evolution generates a canonical transfor- 
mation directly from the action principle. 
Recall that the Lagrangian action S$ is 


Sidst) = f° Lert (5.357) 


th 


We computed the variation of the action in deriving the Lagrange 
equations. The variation is (see equation 1.33) 


Sy S{al(trst2) = (o£ oT lalnlt? — f *(E[E]oT{a])n, (5-358) 


rewritten in terms of the Euler-Lagrange operator E. In the deriva- 
tion of the Lagrange equations we considered only variations that 
preserved the endpoints of the path being tested. However equa- 
tion (5.358) is true of arbitrary variations. Here we consider varia- 
tions that are not zero at the endpoints around a realizable path q 
(one for which E [L] oT|q] = 0). For these variations the variation 
of the action is just the integrated term: 


by S[q\(ti, t2) = (32L oT [ql)nl? = p(t2)n(t2) — p(tı)n(tı). (5-359) 


Recall that p and 7 are structures, and the product implies a sum 
of products of components. 
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Consider a continuous family of realizable paths, the path for 
parameter s is G(s), and the coordinates of this path at time t are 
q(s)(t). We define (s) = Dq(s); the variation of the path along 
the family is the derivative of the parametric path with respect to 
the parameter. Let 


S(s) = $[q(s)](tr, t2) (5.360) 


be the value of the action from t; to t2 for path q(s). The deriva- 
tive of the action along this parametric family of paths is 7° 


DS(s) = 653)S14(s)] 
= (ðL oT G(s)))A(s)|2 — | “(ElL] oP [a(s)))A(s). (5-361) 
Since q(s) is a realizable path E[L] o '|q(s)] = 0. So 


DS(s) = (QL oT [G(s)])A(s)|2 
= p(s) (t2)i(s)(t2) — p(s) (t1)(s)(t1), (5.362) 


where p(s) is the conjugate momentum to q(s). The integral of 
DS is 


SIAN) = Slr) = f (D8) 
= f at), (8.363) 
where 


h(t)(s) = BHAS) = PC) Dals) e). (5.364) 


In conventional notation the latter line integral is written 


| Epai- | E pai, (5.365) 
Y2 Yi 


i i 


where 71(8) = q(s)(t1) and y2(s) = q(s) (t2). 


?9Let f be a path dependent function, #(s) = Dã(s), and g(s) 
variation of f at G(s) in the direction 7(s) is Sacs) f[G(s)] = Dg(s). 
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For a loop family of paths (such that q(s2) = ¢(s1)), the differ- 
ence of actions at the endpoints vanishes, so we deduce 


$ Epai = f E piii, (5.366) 
Y2 V1 


i i 


which is the line-integral version of the integral invariants. 
In terms of area integrals, using Stokes’ theorem, this is 


D i _dpidg' = D i dpi’, (5.367) 


where R$ are the regions in the it? canonical plane. We have found 
that the time evolution preserves the integral invariants, thus time 
evolution generates a canonical transformation. 


5.8 Hamilton-Jacobi Equation 


If we could find a canonical transformation so that the transformed 
Hamiltonian was identically zero, then by Hamilton’s equations 
the new coordinates and momenta would be constants. All of the 
time variation of the solution would be captured in the canonical 
transformation, and there would be nothing more to the solution. 
The mixed-variable generating function that does this job satisfies 
a partial differential equation called the Hamilton-Jacobi equation. 
In most cases, the Hamilton-Jacobi equation cannot be solved 
explicitly. When it can be solved the Hamilton-Jacobi equation 
provides a means of reducing a problem to a useful simple form. 
Recall the relations satisfied by an Fə type generating function: 


q = 02F(t, q,p') (5.368) 
p = ð Fo(t,q, p") (5.369) 
H'(t,q',p') = H(t, q, p) + OoFo(t, q, p’). (5.370) 


If we require the new Hamiltonian to be zero, then F> must satisfy 
the equation 


0= H(t, q, O1 F(t, q,P')) + OoF a(t, q, p’). (5.371) 


So the solution of the problem is “reduced” to the problem of 
solving an n-dimensional partial differential equation for Fə with 
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unspecified new (constant) momenta p’. This is the Hamilton- 
Jacobi equation, and in some cases we can solve it. 

We can also attempt a somewhat less drastic method of solu- 
tion. Rather than try to find an Fs that makes the new Hamilto- 
nian identically zero, we can seek an F-shaped function W that 
gives a new Hamiltonian that is solely a function of the new mo- 
menta. A system described by this form of Hamiltonian is also 
easy to solve. So if we set 


BG, q", p”) — H(t, q, oW (t, q, p”) + OoW (t, q, p") 
= E(p") (5.372) 


and are able to solve for W then the problem is essentially solved. 
In this case, the primed momenta are all constant, and the primed 
positions are linear in time. This is an alternate form of the 
Hamilton-Jacobi equation. 

These forms are related. Suppose that we have a W that sat- 
isfies the second form of the Hamilton-Jacobi equation (5.372). 
Then the Fh constructed from W 


F(t, q,p') = W(t,a,p') — E(p')t (5.373) 


satisfies the first form of the Hamilton-Jacobi equation (5.371). 
Furthermore 


p=OFo(t,¢,p') = W (t, q, p’), (5.374) 
so the primed momenta are the same in the two formulations. But 


q = ô2F>(t,q, p") 
T O2W (t, q, p') z DE(p')t 
= q" — DE(p')t, (5.375) 


so we see that the primed coordinates differ by a term that is 
linear in time—both p'(t) = pọ and q'(t) = qb are constant. Thus 
we can use either W or F as the generating function depending 
on the form of the new Hamiltonian that we want. 

Note that if H is time independent then we can often find a 
time-independent W that does the job. For time-independent W 
the Hamilton-Jacobi equation simplifies to 


E(p') = H(t, q, W(t, q, p')). (5.376) 
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The corresponding F> is then linear in time. Notice that an im- 
plicit requirement is that the energy can be written as a function 
of the new momenta alone. This excludes the possibility that the 
transformed phase-space coordinates q’ and p’ are simply initial 
conditions for q and p. 


Exercise 5.24: Hamilton-Jacobi with F; 


We have used an F>-type generating function to carry out the Hamilton- 
Jacobi transformations. Carry out the equivalent transformations with 
an F\-type generating function. Find the equations corresponding to 
equations (5.371), (5.372), and (5.376). 


5.8.1 Harmonic Oscillator 


Consider the familiar time-independent Hamiltonian 


H(t, xz, p) = a + t, (5.377) 
We form the Hamilton-Jacobi equation for this problem 

0 = A(t, x, 0 F(t, £, p’)) + 3o F(t, £, p") (5.378) 
Using Fo(t, x, p') = W(t, x, p’) — E(p')t we find 

E(p') = H(t, x, 0,W(t, £, p") ). (5.379) 
Writing this out explicitly 

pja A Py) al (5.380) 


2m 2 
and solving for ô, W 


oW (t,x, p) = ym (ew — =). (5.381) 


Integrating gives the desired W: 


W(t, £, p") = i 2m e dz. (5.382) 
2 


We can use either W or the corresponding F» as the generating 
function. First, take W to be the generating function. We obtain 
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the coordinate transformation by differentiating 
a = O2W (t, ap) 


= a mE) -e (5.383) 
y2m (E(p’) = £22) 


and then integrating to get 


ai= | ZDE@ ) arcsin ( ~~ + C(p’), (5.384) 


with some integration constant C(p’). Inverting this, we get the 
unprimed coordinate in terms of the primed coordinate and mo- 
mentum 


ee 2 in sam (2! cw). (5.385) 


The new Hamiltonian H’ depends only on the momentum 


H'(t,2',p!) = E(p). (5.386) 
The equations of motion are just 


Da'(t) = 8H’ (t, x'(t),p'(t)) = DE(p') 
Dp'(t) = —O, H' (t, 2’ (t), p'(t)) = 0, (5.387) 


with solution 


p'(t) = Po (5.388) 


for initial conditions xj and pp. If we plug these expressions for 
x(t) and p'(t) into equation (5.385) we find 


a oe ae D) [A wwae'ye + zo — C(p')) 
2E(p') 
k 


COMP? sin [Ee E o)| 


in (wt + $), (5.389) 


x(t) = 


I 
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where the angular frequency is w = \/k/m, the amplitude is A = 
2E(p')/k, and the phase is ¢ = —wto = w(x — C(p’))/DE(p’). 
We can also use Fo = W — Et as the generating function. The 
new Hamiltonian is zero, so both 2’ and p’ are constant, but the 
relationship between the old and new variables is 


x! = 02 Fə(t, T iP) 
= ôW (t,£, p") — DE(p’)t 


mDE(p') i 
= a e 


= [7 DE% we an") - C(p!) — DE(p')t. (5.390) 


Plugging in the solution 2’ = zp and p’ = pj and solving for 
x we find equation (5.389). So once again we see that the two 
approaches are equivalent. 

It is interesting to note that the solution depends upon the 
constants E(p') and DE(p’) but otherwise the motion is not de- 
pendent in any essential way on what the function E actually is. 
The momentum p’ is constant and the values of the constants are 
set by the initial conditions. Given a particular function E the 
initial conditions determine p’, but the solution can be obtained 
without further specifying the E function. 

If we choose particular functions E we can get particular canon- 
ical transformations. For example, a convenient choice is simply 


E(p’) = ap’, (5.391) 


for some constant a that will be chosen later. We find 


sin =r. (5.392) 


So we see that a convenient choice is a = w = \/k/m, so 


2 / 
ar Tans, (5.393) 


408 Chapter 5 Canonical Transformations 


with 6 = vkm. The new Hamiltonian is 
H'(t,a',p') = E(p') = wp’. (5.394) 


The solution are just 2’ = wt + 2 and p' = pp. Substituting the 
expression for x in terms of x’ and p' into H(t, x, p) = H' (t, 2’, p’) 
we derive 


= y/2p' b cos x’. (5.395) 


The two transformation equations (5.393) and (5.395) are what 
we have called the polar-canonical transformation (equation 5.34). 
We have already shown that this transformation is canonical and 
that it solves the harmonic oscillator, but it was not derived. Here 
we have derived this transformation as a particular case of the 
solution of the Hamilton-Jacobi equation. 

We can also explore other choices for the E function. For ex- 
ample, we could choose 


E(p') = sap”. (5.396) 
Following the same steps as before 


12 / 
= 7 sin a (5.397) 


So a convenient choice is again a = w leaving 


mee 
v= ie as 
x 

p = Bp' cos A (5.398) 


with 8 = (km)!/4. By construction, this transformation is also 
canonical and also brings the harmonic oscillator problem into a 
easily solvable form. 


H'(t,2',p') = wp? (5.399) 


The harmonic oscillator Hamiltonian has been transformed to 
what looks a lot like the Hamiltonian for a free particle. This 
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is very interesting. Notice that whereas Hamiltonian (5.394) does 
not have a well defined Legendre transform to an equivalent La- 
grangian, the “free particle’ harmonic oscillator has a well defined 
Legendre transform: 


ql? 
(ig (eee ae oe (5.400) 
Of course, there may be additional properties that make one choice 
more useful than others for particular applications. 


Exercise 5.25: Pendulum 


Solve the Hamilton-Jacobi equation for the pendulum; investigate both 
the circulating and oscillating regions of phase space. (Note: This is a 
long story and requires some knowledge of elliptic functions.) 


5.8.2 Kepler Problem 


We can use the Hamilton-Jacobi equation to find canonical coor- 
dinates that solve the Kepler problem. This is an essential first 
step to doing perturbation theory for orbital problems. 

In rectangular coordinates (x,y,z), the Kepler Hamiltonian is 


py 
H(t; £, Y, Z; Px, Py, Pz) = Im TS 5 (5.401) 
m r 
where r? = z? +y? +2? and p? = p2 +p? +p2. The Kepler problem 
describes the relative motion of two bodies; it is also encountered 
in the formulation of other problems involving orbital motion such 
as the n-body problem. 
We try a generating function of the form W (t; £, y, 2; Phs, Py P'e) 


The Hamilton-Jacobi equation is then?” 


1 2 

E(p') = Im | (A1.0W (t; £, y, z; pl Py P) 
2 

F (311W (t; £,y, Z; P'e Pys Pe) ) 


+ (Oi2W (tx, y, 25 Bey ysP.)) | - . (5.402) 


30Remember that 01,9 means the derivative with respect to the first coordinate 
position. 
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This is a partial differential equation in the three partial deriva- 
tives of W. We stare at it a while and give up. 

Next we try converting to spherical coordinates. This is mo- 
tivated by the fact that the potential energy only depends on r. 
The Hamiltonian in spherical coordinates (r,6,¢), where 6 is the 
colatitude and ¢ is the longitude, is 


2 
Hs(t; r, 0, 9; Pr, po, Pe) = = p+ + eer (6.403) 
The Hamilton-Jacobi equation is 
E(p1, P2, P3) 
= > |(A.0W (t; r, 8, $; p4, ph p5))? 
+ 5 (311W (t; r, 0, $; Ph, Pb, 5)” 
azg O1aW (tr, dippo] =E (5404) 


We can solve the Hamilton-Jacobi equation by successively iso- 
lating the dependence on the various variables. Looking first at 
the ¢ dependence, we see that, outside of W, ọ appears only in 
one partial derivative. If we write 


W(t; r, 9, 6; P1 P2 P3) = f(r, 0, P1, P2: P3) + P3 >, (5.405) 
then 812W (t; r, 0, 0; p1, p9, p3) = p3, and then ¢ does not appear 
in the remaining equation for f: 

E(p1, P2, P3) 


1 
= 2m {(Aof(r, 0, pl, Pa, p3)) 


1\2 
+ 5 uaF 8 pep) Pa) |} . eee) 


sin? 0 


Any function of the p; could have been used as the coefficient of 

@ in the generating function. This particular choice has the nice 

feature that p} is the z component of the angular momentum. 
We can eliminate the 6 dependence if we choose 


f(r, 9, p1, P2, p3) = R(r, P1, Po, P3) + O(9, pi, P2, P3) (5.407) 
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and require that © solves 


1\2 
(A0(0, pi, Ph 3)” + 5) = (p))°. (5.408) 


We are free to choose the right-hand side to be any function of 

the new momenta. This choice reflects the fact that the left-hand 

side is non-negative. It turns out that ph is the total angular 

momentum. This equation for © can be solved by quadrature. 
The remaining equation that determines R is 


1 2 1 H 
E(ph, Po Ps) = =~ [(10R(r, ph ph p3)? + | (P4)"] - £, (6.409) 


which also can be solved by quadrature. 
Altogether the solution of the Hamilton-Jacobi equation reads 


r 2m p, 2 1/2 
W (r, 0, $, Ps Po, P3) = f (2m 040405) ag E 7, = dr 


6 ry2\ 1/2 
-S (0-5) ao 
+ pho. (5.410) 


It is interesting that our solution to the Hamilton-Jacobi partial 
differential equation is of the form 


W(t; r, 9, P; Di, Pd, P3) 
= R(r, p1, p2, p3) + O(0, p1, P2, P3) + P(, p1, Po, p3). (5-411) 


Thus we have a separation of variables technique that involves 
writing the solution as a sum of functions of the individual vari- 
ables. This might be contrasted with separation of variables tech- 
nique encountered in elementary quantum mechanics and classi- 
cal electrodynamics which use products of functions of individual 
variables. However, integrable problems in classical mechanics are 
rare, so it would be incorrect to think of this method as a general 
solution method. 

The coordinates gj, q5, q4 conjugate to the momenta p}, Ph, P$ 
are 


q, = ð20W (t; r, 0, 6; p! , ph, ph) 
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Ea 2 1\2 —1/2 
=m f (2m (0), rv) E- BP) ar (6.12) 
qh = 02,1W (t; r, 0, $; P1, Po, P3) 
ce ee ee 
= do 
vs f (o ws) 
a —1/2 
1 1 I] 2mu (p2)? 
=P] a 2mp] +4 2 dr (5.413) 
q = 22W (t; T, 0, Q; P1, P2, P3) 
= ¢. (5.414) 


We are still free to choose the functional form of E. A conve- 
nient (and conventional) choice is 


mu? 


E(p1, p2, p3) = -=r 5.415 
(P1, P2, P3) 2(p',)? ( ) 
With this choice the momentum p} has dimensions of angular 
momentum, and the conjugate coordinate is an angle. 

The Hamiltonian for the Kepler problem is reduced to 


2 
m 
H' (t; q1, 99; 933 P1, Pos P3) = E(pi, P3, P3) = E (5.416) 
1 

Thus 

qi = nt + do (5.417) 
d2 = %o (5.418) 
d3 = do; (5.419) 


where n = mp?/(p/,)° and where qio, gbo, and qbo are the initial 
values. Only one of the new variables changes with time.*! 


31 The canonical phase space coordinates can be written in terms of the pa- 
rameters that specify an orbit. We will just summarize the results. For further 
explanation see [33] or [35]. 

Assume we have a bound orbit, with semimajor axis a, eccentricity e, 
inclination i, longitude of ascending node Q, argument of pericenter w, 
and mean anomaly M. The three canonical momenta are p} = J/mpa, 
p2 = /mpa(1 — e?), and ps = \/mpa(1—e?)cosi. The first momentum 
is related to the energy, the second momentum is the total angular momen- 
tum, and the third momentum is the component of the angular momentum 
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5.8.3 Fə and the Lagrangian 


The solution to the Hamilton-Jacobi equation, the mixed variable 
generating function that generates time evolution, is related to the 
action used in the variational principle. In particular, on realizable 
paths the derivative of the generating function has the same value 
as the Lagrangian. 

Let F(t) = F(t, q(t), p'(t)) be the value of F> along the paths 


q and p’ at time t. The derivative of F is 


DFA(t) = 01 F(t, a(t), p'(t)) Dalt) 
+ O2Fo(t, q(t), p'(t))Dp' (©) 

+ Oo F2(t, a(t), p'(t)) 

= p(t)Da(t) 

+ 02 Fo(t, q(t), p'(t)) Dp’ (t) 

+ Oo Fo(t, a(t), p'(t)), (5.420) 


where we have used the relation for p in terms of F in the first 
term. Using the Hamilton-Jacobi equation (5.371) this becomes 


DFo(t) = p(t)Da(t) — H(t, a(t), v(t)) + A2Fa(t, a(t), p(t) Dp'(t) 
= L(t, q(t), Dq(t)) + 02Fo(t, q(t), p'(t)) Dp’ (t). (5.421) 


On realizable paths we have Dp'(t) = 0, so along realizable paths 
the time derivative of Fə is the same as the Lagrangian along 
the path. The time integral of the Lagrangian along any path is 
the action along that path. This means that, up to an additive 
term that is constant on realizable paths but may be a function 
of the transformed phase-space coordinates q’ and p’, the F> that 
solves the Hamilton-Jacobi equation has the same value as the 
Lagrangian action for realizable paths. 

The same conclusion follows for the Hamilton-Jacobi equation 
formulated in terms of Fy. Up to an additive term that is con- 
stant on realizable paths but may be a function of the transformed 
phase-space coordinates q’ and p’, the F} that solves the corre- 
sponding Hamilton-Jacobi equation has the same value as the La- 
grangian action for realizable paths. 


in the 2 direction. The conjugate canonical coordinates are qi = M, q =w, 
and q3 =. 
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Recall that a transformation given by an F5-type generating 
function is also given by an F}-type generating function related to 
it by a Legendre transform (see equation 5.196): 


Fi(t,q, q) = Fo(t,q, p’) — d'P', (5.422) 


provided the transformations are non-singular. In this case, both 
q and p’ are constant on realizable paths, so the additive constants 
that make F} and F) equal to the Lagrangian action differ by q'p'. 


Exercise 5.26: Harmonic oscillator 


Let’s check this for the harmonic oscillator (of course). 


a. Finish the integral (5.382): 


Wt, x, p') = n y?r (zo = =) dz 


Write the result in terms of the amplitude A = \/2E(p’)/k. 


b. Check that this generating function gives the transformation: 


a! = 0,W(t,2,p') = [DEG sin (aE) 


which is the same as equation (5.384) for a particular choice of the 
integration constant. The other part of the transformation is 


p= O,W(t, x, p") = mk V A? — ga 


with the same definition of A as before. 


c. Compute the time derivative of the associated Fo along realizable 
paths (Dp’ = 0), and compare to the Lagrangian along realizable paths. 


5.8.4 The Action Generates Time Evolution 


We define the function F(t1, q1, t2, q2) to be the value of the action 
for a realizable path q such that q(t1) = qı and q(t2) = q2. So F 
satisfies 


F(t1, a(t1), ta, ¢(t2)) = S[q|(ti, t2) = f LoT{q. (5.423) 
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For variations 7 that are not necessarily zero at the end times 
and for realizable paths q the variation of the action is 


dn Slq|(ti, t2) = O2L o Cidale 
= p(t2)n(t2) — p(ti)n(t1). (5.424) 


Alternatively, the variation of S[q] in equation (5.423) gives 


bn Sla] (tı, t2) = OF (t1, q(t1), t2, q(t2))n(tı) 
+ 03F (tı, q(tı), ta, q(t2))n(t2). (5.425) 


Comparing equations (5.424) and (5.425), and using the fact that 
the variation 7 is arbitrary, we find 

OF (t1, q(t1), t2, q(t2)) = —p(t1) 

zF (tı, q(tı), ta, q(t2)) = p(t2). (5.426) 


The partial derivatives of F with respect to the coordinate argu- 
ments give the momenta. Abstracting off paths, we have 


OF (tı, q1, t2, q2) = -pı 
zF (t1, q1, t2, q2) = p2- (5.427) 


This sort of looks like the F; type generating function relations, 
but here there are two times. 

Given a realizable path q such that q(t1) = qı and q(t2) = q2, 
we get the partial derivatives with respect to the time slots: 


o(Sla]) (tı, t2) = =L(tı, a(t), Da(tı)) i 
= OF (ti, q1, t2, q2) + WF (t1, q1, t2, q2) Dq(tı) 
7. OoF (tı, qı, #2, 92) — p(ti)Da(ti). (5.428) 


Therefore 


OoF (t1, qı, t2, 2) = H(t, q1, p1) 
= A(ti,q,—-O1.F (tı, qı, t2, q2))- (5.429) 


And similarly 


02F (t1, 11, t2, q2) = —H (t2, q2, p2) 
= —H (to, q2, 03F (t1, q1, t2, q2)). (5.430) 
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These are a pair of the Hamilton-Jacobi equations, computed at 
the endpoints of the path. 

Solving equations (5.427) for q2 and p2 as functions of t2, and 
the initial state t1, q1, p1, we get the time evolution of the system 
in terms of F. The function F generates time evolution. 

The function F can be written in terms of the F» or F} that 
solves the Hamilton-Jacobi equation. We can compute time evo- 
lution by using the Fə solution of the Hamilton-Jacobi equation 
to express the state (t1,q1,p1) in terms of the constants q’ and p' 
at a given time tı. We can then perform a subsequent transfor- 
mation back from q’ p’ to the original state variables at a different 
time tg, giving the state (t2,q2,p2). The composition of canoni- 
cal transformations is canonical. The generating function for the 
composition is the difference of the generating functions for each 
step: 


F (ti, q1, ta, @2) = Fo(te, q2, p) — Fa(ti,a,’), (5.431) 
with the condition 

02 Fo(t2, q2, pP) — 02F2(t1, q1, pP) = 0, (5.432) 
which allows us to eliminate p’. 


Exercise 5.27: Uniform acceleration 


a. Compute the Lagrangian action, as a function of the endpoints and 
times, for a uniformly accelerated particle. Use this to construct the 
canonical transformation for time evolution from a given initial state. 


b. Solve the Hamilton-Jacobi equation for the uniformly accelerated 
particle, obtaining the Fə that makes the transformed Hamiltonian zero. 
Show that the Lagrangian action can be expressed as a difference of two 
applications of this F>. 


5.9 Lie Transforms 


The evolution of a system under any Hamiltonian generates a con- 
tinuous family of canonical transformations. To study the behav- 
ior of some system governed by a Hamiltonian H it is sometimes 
appropriate to use a canonical transformation generated by evo- 
lution governed by another Hamiltonian-like function W on the 
same phase space. Such a canonical transformation is called a Lie 
transform. 
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CAH 


to, Go, Po to, d0; Po 


Figure 5.9 Time evolution of a trajectory started at the point 
(to, qo, po), governed by the Hamiltonian H is transformed by the Lie 
transform governed by the generator W. The time evolution of the 
transformed trajectory is governed by the Hamiltonian H” 


The functions H and W are both Hamiltonian-shaped functions 
defined on the same phase space. Time evolution for an interval 
A governed by H is a canonical transformation Ca 74. Evolution 
by W for an interval € is a canonical transformation Cow: 


Gp) = Ci w(t, q, p"). (5.433) 


The independent variable in the H evolution is time, and the inde- 
pendent variable in the W evolution is an arbitrary parameter of 
the canonical transformation. We chose C’ for the W evolution so 
that the canonical transformation induced by W does not change 
the time in the system governed by H. 

Figure 5.9 shows how a Lie transform is used to transform a 
trajectory. We can see from the diagram that the canonical trans- 
formations obey the relation: 


Cl w ie) Ca lH = CAH (0) Cl w- (5.434) 


For generators W that do not depend on the independent vari- 
able the resulting canonical transformation C! y, is time-independent 
and symplectic. For a time-independent symplectic transforma- 
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tion, the transformation is canonical if the Hamiltonian transforms 
by composition?” 


H'=Ho Ci w- (5.435) 
We will only work with Lie transforms with generators that are 
independent of the independent variable. 


Lie transforms of functions 

The value of a phase-space function F changes if its arguments 
change. We define the function E, w of a function F of phase- 
space coordinates (t, q, p) by 


El wF =F oC y: (5.436) 


€, 
In particular, the Lie transform advances the coordinate and 


momentum selector functions Q = I, and P = Ib: 
(El wQ)(t, d p) = (QoClw)(t.d,p') = Q(t,4,p) = 4 
(El wP)(t,q',p') = (PoCow)(t.d,p') =P(t,qp) =p (5.437) 
So we may restate equation (5.436) as: 
(El wF)(t, d'p") 
= F(t, (EL wQ) (t, aD), (El wP)(t, q',p')). (5.438) 


More generally, Lie transforms descend into compositions: 


We say that El yF is the Lie transform of the function F. 


(El w(F o G)) = F o (EL wG) (5.439) 


32In general, the generator W could depend on its independent variable. If 
so, it would be necessary to specify a rule that gives the initial value of the 
independent variable for the W evolution. This rule may or may not depend 
upon the time. If the specification of the independent variable for the W evo- 
lution does not depend on time then the resulting canonical transformation 
Cl w is time independent and the Hamiltonians transform by composition. If 
the generator W depends on its independent variable and the rule for speci- 
fying its initial value depends on time, then the transformation C/ w is time 
dependent. In this case there may need to be an adjustment to the relation 
between the Hamiltonians H and H’. In the extended phase space all these 
complications disappear. There is only one case. We can assume all generators 
W are independent of the independent variable. 
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In terms of E! w we have the canonical transformation: 


q = (El wQ)(t, d'p) 
p= (El wP)(t,q',p') 
H= E yH. (5.440) 


We can also say 

(t, q, p) = (E w1) (t, gp’), (5.441) 

where J is the phase space identity function: I(t,q, p) = (t, q, p). 
Note that E{y has the property:°° 

Ele, w = Ea w ° Eh w = Faw o El w- (5.442) 


€1, 


The identity J is 

I = Ew. (5.443) 
We can define the inverse function 

(Euw) = Blew (5.444) 
with the property 

I = Ely o (Elw) = (Elw) o Eqw- (5.445) 


Simple Lie transforms 
For example, suppose we are studying a system for which a rota- 
tion would be a helpful transformation. To concoct such a trans- 
formation we note that we intend a configuration coordinate to 
increase uniformly with a given rate. In this case we want an 
angle to be incremented. The Hamiltonian which consists solely 
of the momentum conjugate to that configuration coordinate al- 
ways does the job. So the angular momentum is an appropriate 
generator for rotations. 

The analysis is simple if we use polar coordinates r, 9 with con- 
jugate momenta pr, pọ. The generator W is just: 


W (T; r, 0; Pr, po) = Po (5.446) 


33 The set of transformations E; w with the operation composition and with 
parameter € is a one parameter Lie group. 
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The family of transformations satisfies Hamilton’s equations: 


Dr=0 
D0 = 1 
Dp, =0 
Dpg = 0 (5.447) 


Since the only variable which appears in W is pg then @ is the only 
variable that varies as € is varied. In fact the family of canonical 
transformations is: 


r=r' 
0=0 +e 
Pr =P, 
Po = Pp (5.448) 


So angular momentum is the generator of a canonical rotation. 

The example is simple, but it illustrates one important feature 
of Lie transformations—they give one set of variables entirely in 
terms of the other set of variables. This should be contrasted with 
the mixed-variable generating function transformations which al- 
ways give a mixture of old and new variables in terms of a mixture 
of new and old variables, and thus require an inversion to get one 
set of variables in terms of the other set of variables. This in- 
verse can only be written in closed form for special cases. In 
general there is considerable advantage in using a transformation 
rule that generates explicit transformations from the start. The 
Lie transformations are always explicit, in the sense that they give 
one set of variables in terms of the other, but for there to be ex- 
plicit expressions the evolution governed by the generator must 
be solvable. 

Let’s consider another example. This time consider a three 
degree of freedom problem in rectangular coordinates, and take 
the generator of the transformation to be the z component of the 
angular momentum: 


W (T; £, Y, 23 Px, Py, Pz) = LPy — YPr (5.449) 
The evolution equations are 


Dz = —y 
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Dy=2 
Dz=0 
Dp; = —Py 
Dpy = Pr 
Dp, =0 (5.450) 


We notice that z and p; are unchanged; and that the equations 
governing the evolution of x and y decouple from those of py and 
py. Each of these pairs of equations represent simple harmonic 
motion, as can be seen by writing them as second order systems. 
The solutions are 


x = x’ cose — y' sine 
Jas / 
yY = x£ SME +Y cose 
z=7' (5.451) 


— I 
Pr = Py COS € — Py Sin € 


fess / 
Py = Pz SINE + Py COS € 
Pz = Dy (5.452) 
So we see that again a component of the angular momentum gen- 
erates a canonical rotation. There was nothing special about our 


choice of axes, so we can deduce that the component of angular 
momentum about any axis generates rotations about that axis. 


Example 
Suppose we have a system governed by the Hamiltonian 


H(t; £, Y; Pr, Dy) = $(p2 + p2) + ga(a — y)? + 50(@ + y)?. (5.453) 


Hamilton’s equations couple the motion of x and y 


Dz = Pz 
Dy = py 
Dp, = —a(x — y) — (a + y) 
Dpy = a(x — y) — b(a + y). (5.454) 


We can decouple the system by performing a coordinate rota- 
tion by 7/4. This is generated by 


Wr; T, Y; Pz, Py) = TPy — YPr, (5.455) 
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which is similar to the one above but without the z degree of 
freedom. Evolving (7;2,y;pz,~Py) by W for an interval of 7/4 
gives a canonical rotation: 


x = x' cost/4—y'sinn/4 
y = x'sin T/4 + y'cost/4 
Px = pr cos T/4 — py sin 1/4 
Py = ph sin T/4 + pi, cos 1/4. (5.456) 


Composing the Hamiltonian H with this time independent trans- 
formation gives the new Hamiltonian 


H'(t;2', 9; Pe Py) = (302)? + O(2')?) + (30)? + a(y')?), (5.457) 


which is a Hamiltonian for two uncoupled harmonic oscillators. 
So the original coupled problem has been transformed by a Lie 
transform to a new form for which the solution is easy. 


5.10 Lie Series 


Taylor’s theorem gives us a way of approximating the value of a 
nice enough function at a point near to a point where the value 
is known. If we know f and all of its derivatives at t then we can 
get the value of f(t + €) for small enough e, as follows: 


f(t+e) = f()+eD f()4+52D? KOLS +5" f(t)+++-(5.458) 


We also recall that the power series for the exponential function 
is: 


1 1 
PS bket ee ES (5.459) 


This suggests that we can formally construct a Taylor-series op- 
erator as the exponential of a differential operator’4 


1 1 
eP Aaa (ed): tents (Dye opees (5.460) 


34We are playing fast-and-loose with differential operators here. In a formal 
treatment it is essential to prove that these games are mathematically well- 
defined and have appropriate convergence properties. 
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with the goal that we will be able to write 
f(t-+6) = (eP PE). (5.461) 


We have to be a bit careful here: (eD)? = eDeD. We can only turn 
it into €?D? because e€ is a scalar constant which must commute 
with every differential operator. But with this caveat in mind we 
can define the differential operator 


(EP F(t) = FO) + DIO + ZADI + + FED FD + 
(5.462) 


Before going on, it is interesting to compute with these a bit. 
In the code transcripts that follow we develop the series by expo- 
nentiation. We can incrementally examine the series by looking 
at successive elements of the (infinite) sequence of terms of the se- 
ries. The procedure series: for-each is an incremental traverser 
which applies its first argument to successive elements of the se- 
ries given as its second argument. The third argument (when 
given) specifies the number of terms to be traversed. In each of 
the following transcripts we print simplified expressions for the 
successive terms. 

The first thing to look at is the general Taylor expansion for 
an unknown literal function, expanded around t, with increment 
c. Understanding what we see in this simple problem will help us 
understand what we will see in more complex problems later. 


(series:for-each print-expression 
((Cexp (* ’?epsilon D)) 
(literal-function ’f)) 
’t) 
6) 


(£ t) 

(* ((D f) t) epsilon) 

(* 1/2 (((expt D 2) f) t) (expt epsilon 2)) 
(* 1/6 (((expt D 3) f) t) (expt epsilon 3)) 
(* 1/24 (((expt D 4) f) t) (expt epsilon 4)) 
(* 1/120 (((expt D 5) f) t) (expt epsilon 5)) 


We can also look at the expansions of particular functions that 
we recognize, such as the expansion of sin around 0. 
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(series:for-each print-expression 
(CCexp (* ’epsilon D)) sin) 0) 
6) 


0 

epsilon 

0 

(* -1/6 (expt epsilon 3)) 
(0) 

(* 1/120 (expt epsilon 5)) 


It is often instructive to expand functions we usually don’t re- 


member, such as f(x) = /14+ 2. 


(series:for-each print-expression 
(((exp (* ’epsilon D)) 
(lambda (x) (sqrt (+ x 1)))) 
0) 
6) 


1 

(* 1/2 epsilon) 

(* -1/8 (expt epsilon 2)) 
(* 1/16 (expt epsilon 3)) 
(* -5/128 (expt epsilon 4)) 
(* 7/256 (expt epsilon 5)) 


Exercise 5.28: Binomial series 


Develop the binomial expansion of (1 + x)” as a Taylor expansion. Of 
course, it must be the case that for n a positive integer all of the coeffi- 
cients except for the first n + 1 are zero. However, in the general case, 
for symbolic n, the coefficients are rather complicated polynomials in n. 
For example, you will find that the seventh term is: 


(+ (* 1/5040 (expt n 7)) 
(* -1/240 (expt n 6)) 


(* 5/144 (expt n 5)) 
(* -7/48 (expt n 4)) 
(* 29/90 (expt n 3)) 
(* -7/20 (expt n 2)) 


(* 1/7 n)) 


These terms must evaluate to the entries in Pascal’s triangle. In partic- 
ular, this polynomial must be zero for n < 7. How is this arranged? 
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Dynamics 

Now to play this game with dynamical functions we want to pro- 
vide a derivative-like operator that we can exponentiate, which 
will give us the advance operator. The key idea is to write the 
derivative of the function in terms of the Poisson bracket. Equa- 
tion (3.75) shows how to do this in general: 


D(F o0) = ({F, H} + &F) o o (5.463) 


We define the operator Dy by 


DyF = 0oF + {F,H}, (5.464) 
so 
DyFoo=D(Foo), (5.465) 


and iterates of this operator can be used to compute higher order 
derivatives: 


D"(Foo)=DiFoo (5.466) 


Thus we can rewrite the advance of the path function f = 
Foo for an interval € with respect to H as a power series in the 
derivative operator Dy applied to the phase-space function F and 
then composed with the path: 


fit+ e) = (ef) = (€?* F) o oft) (5.467) 


Indeed, we can implement the time-advance operator with this 
series when it converges. 


Exercise 5.29: Iterated derivatives 


Show that equation (5.466) is correct. 


Exercise 5.30: Lagrangian analog 
Compare Dy with the total time derivative operator. Recall that 


D,F oT[q] = D(F o Td) 


abstracts the derivative of a function of a path through state space to 
a function of the derivatives of the path. Define another derivative 
operator Dz, analogous to Dy that would give the time derivative of 
functions along Lagrangian state paths that are solutions of Lagrange’s 
equations for a given Lagrangian. How might this be useful? 
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Let H bea Hamiltonian. If F and H are both time-independent, 
we can simplify the computation of the advance of F. In this case 
we define the Lie derivative operator Ly such that 


behea Fat (5.468) 
which reads “the Lie derivative of F with respect to H.” So 
Dy = o + LH (5.469) 
and for time-independent F 

D(Foo)=LyFoo (5.470) 
We can iterate this process to compute higher derivatives. So 
L} F = {{F, H}, H}, (5.471) 


and successively higher order Poisson brackets of F with H give 
successively higher order derivatives when evaluated on the tra- 
jectory. 

Let f = F o g, we have 


Df = (LHF) oo (5.472) 
D?’ f = (L}4F) o0 (5.473) 
(5.474) 


Thus we can rewrite the advance of the path function f for an 
interval e with respect to H as a power series in the Lie derivative 
operator applied to the phase-space function F and then composed 
with the path: 


f(t +e) = (eP f)(t) = (e F) o o(t) (5.475) 
We can implement the time-advance operator E! y With the series 


E! yF = (e™" F), (5.476) 


35Our Ly is a special case of what is referred to as a Lie derivative in differ- 
ential geometry. The more general idea is that a vector field defines a flow. 
The Lie derivative of an object with respect to a vector field gives the rate of 
change of the object as it is dragged along with the flow. In our case the flow 
is the evolution generated by Hamilton’s equations, with Hamiltonian H. 
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when this series converges. 

We have shown that time evolution is canonical, so the series 
above are formal representations of canonical transformations as 
power series in the time. These series may not converge, even if 
the evolution governed by the Hamiltonian H is well defined. 


Computing Lie series 

We can use the Lie transform as a computational tool to locally 
examine the evolution of dynamical systems. We define the Lie 
derivative of F, as a derivative-like operator, relative to the given 
Hamiltonian function, H:°° 


(define ((Lie-derivative H) F) 
(Poisson-bracket F H)) 


We also define a procedure to implement the Lie transform:*” 


(define (Lie-transform H t) 
(exp (* t (Lie-derivative H)))) 


Let’s start by examining the beginning of the Lie series for the 
position of a simple harmonic oscillator of mass m and spring 
constant k. Note that we make up the Lie transform (series) 
operator by passing it an appropriate Hamiltonian function and 
an interval to evolve for. The resulting operator is then given the 
position selector procedure. The Lie transform operator returns 
the new position selector procedure, that when given the phase- 
space coordinates x0 and pO returns the position selected from the 
result of advancing those coordinates by the interval dt. 


36 Actually, we define the Lie derivative slightly differently, as follows: 


(define ((Lie-derivative-procedure H) F) 
(Poisson-bracket F H)) 
(define Lie-derivative 
(make-operator Lie-derivative-procedure ’Lie-derivative) ) 


The reason is that we want Lie-derivative to be an operator, which is just like 
a function except that the product of operators is interpreted as composition 
while the product of functions is the function computing the product of their 
values. 


37The Lie-transform procedure here is also defined to be an operator, just 
like Lie-derivative, but in this case the operator declaration is purely formal 
because the exp procedure will produce a series, and we do not currently have 
a way of iterating that process. 
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(series:for-each print-expression 
(((Lie-transform (H-harmonic ’m ’k) ’dt) 
coordinate) 
(up O ’x0 ’p0)) 
6) 


x0 

(/ (* dt pO) m) 

(/ (* -1/2 (expt dt 2) k x0) m) 

(/ (* -1/6 (expt dt 3) k pO) (expt m 2)) 

(/ (* 1/24 (expt dt 4) (expt k 2) x0) (expt m 2)) 
(/ (* 1/120 (expt dt 5) (expt k 2) pO) (expt m 3)) 


We should recognize the terms of this series. We start with the ini- 
tial position xo. The first-order correction (p9/m)dt is due to the 
initial velocity. Next we find an acceleration term (—kao/2m)dt? 
due to the restoring force of the spring at the initial position. 

The Lie transform is just as appropriate for showing us how the 
momentum evolves over the interval: 


(series:for-each print-expression 
(((Lie-transform (H-harmonic ’m ’k) ’dt) 
momentum) 
(up O ’x0 ’p0)) 
6) 


po 

(* -1 dt k x0) 

(/ (* -1/2 (expt dt 2) k pO) m) 

(/ (* 1/6 (expt dt 3) (expt k 2) x0) m) 

(/ (* 1/24 (expt dt 4) (expt k 2) p0) (expt m 2)) 
(/ (* -1/120 (expt dt 5) (expt k 3) x0) (expt m 2)) 


In this series we see how the initial momentum po is corrected by 
the effect of the restoring force —kxodt, etc. 

What is a bit more fun is to see how a more complex phase- 
space function is treated by the Lie series expansion. In the ex- 
periment below we examine the Lie series developed by advancing 
the harmonic-oscillator Hamiltonian, by the transform generated 
by the same harmonic-oscillator Hamiltonian: 
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(series:for-each print-expression 
(((Lie-transform (H-harmonic ’m ’k) ’dt) 
(H-harmonic ’m ’k)) 
(up O ’x0 ’p0)) 
6) 


(/ (+ (* 1/2 k m (expt x0 2)) (* 1/2 (expt pO 2))) m) 


oo0oo0oo0oo 


As we would hope, the series shows us the original energy ex- 
pression (k/2)x + (1/2m)p? as the first term. Each subsequent 
correction term turns out to be zero—because the energy is con- 
served. 

Of course, the Lie series can be used in much more complex 
situations where we want to see the expansion of the motion of a 
system characterized by a more complex Hamiltonian. The planar 
motion of a particle in a general central field is a simple problem 
for which the Lie series is instructive. In the following transcript 
we can see how rapidly the series becomes complicated. It is 
worth one’s while to try to interpret the additive parts of the 
third (acceleration) term shown below: 


(series:for-each print-expression 
(((Lie-transform 
(H-central-polar ’m (literal-function ’U)) 
dt) 
coordinate) 
(up 0 
(up ’r_0 ’phi_0) 
(down ’p_r_O ’p_phi_0))) 
4) 


(up r_0 phi_0) 
(up (/ (* dt p_r_0) m) 
(/ (* dt p-phi_O) (* m (expt r_0 2)))) 
(up 
(+ (/ (* -1/2 ((D U) r_0) (expt dt 2)) m) 
(/ (* 1/2 (expt dt 2) (expt p-phi_0 2)) 
(x (expt m 2) (expt r_0 3)))) 
(/ (* -1 (expt dt 2) p_phi_O p_r_0) 
(* (expt m 2) (expt r_0 3)))) 
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(up 
(+ (/ (* -1/6 (((expt D 2) U) r_0) (expt dt 3) p-r_0) 
(expt m 2)) 
(/ (* -1/2 (expt dt 3) (expt p-phi-0 2) p-r-0) 
(* (expt m 3) (expt r_0 4)))) 
(+ (/ (* 1/3 ((D U) r0) (expt dt 3) p-phi_0) 
(* (expt m 2) (expt r_0 3))) 
(/ (* -1/3 (expt dt 3) (expt p-phi_0 3)) 
(x (expt m 3) (expt r_0 6))) 
(/ (* (expt dt 3) p-phi_O (expt p-_r_0 2)) 
(* (expt m 3) (expt r_O 4))))) 


Of course, if we know the closed form Lie transform it is prob- 
ably a good idea to take advantage of it, but when we do not 
know the closed form the Lie series representation of it can come 
in handy. 


5.11 Exponential Identities 


The composition of Lie transforms can be written as products of 
exponentials of Lie derivative operators. In general, Lie deriva- 
tive operators do not commute. If A and B are non-commuting 
operators, then the exponents do not combine in the usual way: 


efeP Z e^tB, (5.477) 


So it will be helpful to recall some results about exponentials of 
non-commuting operators. 
We introduce the commutator 


|A, B] = AB — BA. (5.478) 
The commutator is bilinear and satisfies the Jacobi identity 


which is true for all A, B, and C. 
We introduce a notation A, for the commutator with respect 
to the operator A: 


AaB = [A,B]. (5.480) 
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In terms of A this is the same as 
[A4, AB] = Aja, B) (5.481) 
An important identity is 


ef Ae? = e^c A 


1 
= A+ [C, A] + 30 [C, A] +. (5.482) 
We can check this term by term. 
We see that 
e A*e 9 = ef Ae Ce Ae f= (eC Ae~°)’ ; (5.483) 


using e~e? = I, the identity operator. Using the same trick 
Ate C= (eC Ae~°)". (5.484) 
More generally, if f can be represented as a power series then 

e? f(A, B,...)e~° = f(ef Ae, e° Be. ...) (5.485) 
For instance, applying this to the exponential function 

elele? = e Ae. (5.486) 
Using equation (5.482 we can rewrite this 

e^el = CA, (5.487) 


Exercise 5.31: Commutators of Lie derivatives 

a. Let W and W’ be two phase space state functions. Use the Poisson 
bracket Jacobi identity to show 

[Lw, Ly] = —Liww'}- (5.488) 


b. Consider the phase space state functions that gives the components 
of the angular momentum in terms of rectangular canonical coordinates 


Jalt; £, Y, Z; Px, Py, Dz) = YPz — ZPy 


Jy(t; £, Y, Z; Pr, Py, Pz) = ZPx — Tpz 
S(t; £, Y, Z; Px, Py, Pz) = ZPy — Ye 
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Show 
[Ls L5,| + Ly, = 0. (5.489) 


c. Relate the Jacobi identity for operators to the Poisson bracket Jacobi 
identity. 


Exercise 5.32: Baker-Campbell-Hausdorff 


Derive the rule for combining exponentials of non-commuting operators: 


e^đeB = PATTI A BS. (5.490) 


5.12 Summary 


Canonical transformations can be used to reformulate a problem 
in coordinates that are easier to understand or that expose some 
symmetry of a problem. 

In this chapter we have investigated different representations 
of a dynamical system. We have found that different representa- 
tions will be equivalent if the coordinate-momentum part of the 
transformation has symplectic derivative, and if the Hamiltonian 
transforms in a specified way. If the phase-space transformation 
is time-independent then the Hamiltonian transforms by compo- 
sition with the phase-space transformation. The symplectic con- 
dition can be equivalently expressed in terms of the fundamental 
Poisson brackets. The Poisson-bracket and the w function are 
invariant under canonical transformations. The invariance of w 
implies the areas of the projections onto fundamental coordinate- 
momentum planes is preserved (Poincaré integral invariant) by 
canonical transformations. 

We can formulate an extended phase space in which time is 
treated as another coordinate. Time dependent transformations 
are simple in the extended phase space. In the extended phase 
space the Poincaré integral invariant is the Poincaré-Cartan inte- 
gral invariant. We can also reformulate a time independent prob- 
lem as a time-dependent problem with fewer degrees of freedom 
with one of the original coordinates taking on the role of time; 
this is the reduced phase space. 

A generating function is a real-valued function of the phase 
space coordinates and time that represents a canonical transfor- 
mation through its partial derivatives. We found that all canoni- 
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cal transformations can be represented by a generating function. 
The proof depends on the Poincaré integral invariant (and not on 
the fact that total time derivatives can be added to Lagrangians 
without changing the equations of motion). 

The time evolution of any Hamiltonian system induces a canon- 
ical transformation: if we consider all possible initial states of a 
Hamiltonian system, and we follow all of the trajectories for the 
same time interval, then the map from the initial state to the fi- 
nal state of each trajectory is a canonical transformation. This 
is true for any interval we choose, so time evolution generates a 
continuous family of canonical transformations. 

We generalized this idea to generate continuous canonical trans- 
formations other than those generated by time evolution. Such 
transformations will be especially useful in support of perturba- 
tion theory. 

In rare cases a canonical transformation can be made to a rep- 
resentation in which the problem is easily solvable: when all coor- 
dinates are ignorable and all the momenta are conserved. Here we 
investigate the Hamilton-Jacobi method for finding such canoni- 
cal transformations. For problems for which the Hamilton-Jacobi 
method works we find that the time-evolution of the system is 
given as a canonical transformation. 


6 


Canonical Perturbation Theory 


The first treatment of the Problem of Three 
Bodies, as well as of Two Bodies, was due to 
Newton. It was given in Book I, Section XI, of the 
Principia, and it was said by Airy to be “the most 
valuable chapter that was ever written on physical 
science.” ... The value of the motion of the lunar 
perigee found by Newton from theory was only 
half that given by observations. In 1872, in certain 
of Newton’s unpublished manuscripts, known as 
the Portsmouth Collection, it was found that 
Newton had accounted for the entire motion of the 
perigee by including perturbations of the second 
order. This work being unknown to astronomers, 
the motion of the lunar perigee was not otherwise 
derived from theory until the year 1749 .... 
Newton regarded the Lunar Theory as being very 
difficult, and he is said to have told his friend 
Halley in despair that it “made his head ache and 
kept him awake so often that he would think of it 
no more.” 


Forest Ray Moulton An Introduction to Celestial 
Mechanics (1914). 


Closed-form solutions of dynamical systems can only rarely be 
found. However, some systems differ from a solvable system by 
the addition of a small effect. The goal of perturbation theory is 
to relate aspects of the motion of the given system to those of the 
nearby solvable system. We can try to find a way to transform the 
exact solution of this approximate problem into an approximate 
solution to the original problem. We can also use perturbation 
theory to try to predict qualitative features of the solutions by 
describing the characteristic ways in which solutions of the solv- 
able system are distorted by the additional effects. For instance, 
we might want to predict where the largest resonance regions are 
located or the locations and sizes of the largest chaotic zones. Be- 
ing able to predict such features can give insight into the behavior 
of the particular system of interest. 
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Suppose, for example, we have a system characterized by a 
Hamiltonian that breaks up into two parts as follows, 


H=H)+e«H, (6.1) 


where Ho is solvable and € is a small parameter. The difference 
between our system and a solvable system is then a small additive 
complication. 

There are a number of strategies for doing this. One strategy 
is to seek a canonical transformation that eliminates the terms of 
order € from the Hamiltonian that impede solution—this typically 
introduces new terms of order e°. Then seek another canonical 
transformation that eliminates the terms of order €? that impede 
solution leaving terms of order e?. We can imagine repeating this 
process until the part that impedes solution is of such high order 
in € that it can be neglected. Having reduced the problem to a 
solvable problem, we can reverse the sequence of transformations 
to find an approximate solution of the original problem. Does 
this process converge? How do we know we can ever neglect the 
remaining terms? Let’s follow this path and see where it goes. 


6.1 Perturbation Theory with Lie Series 


Given a system we look for a decomposition of the Hamiltonian 
in the form 


H(t, q, p) = Ho(t, q, p) + eH (t,4q,p), (6.2) 


where Ho is solvable. We assume that the Hamiltonian has no 
explicit time dependence; this can be ensured by going to the ex- 
tended phase space if necessary. We also assume that a canonical 
transformation has been made so that Hg depends solely on the 
momenta: 


1 Ho = 0. (6.3) 


We carry out a Lie transformation and find the conditions that 
the Lie generator W must satisfy to eliminate the order € terms 
from the Hamiltonian. 
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The Lie transform and associated Lie series specify a canonical 
transformation: 


H' = El wH = e H 
q= (E! wQ)(t, d, p) = (e77 Q)(t, d, p") 
p = (El wP)(t,q',p') = (ce P)(t,q',p') 
(t;a, p) = (EL wI) (t, q',p') = (e I)(t,q',p'), (6.4) 
where Q = J; and P = I» are the coordinate and momentum 


selectors and J is the identity function. Recall the definitions 


1 
elw F = F +eLwF + ze LwF RRE 


=F +e{F,W}+ TF, W},Wh+---, (6.5) 
with Ly F = {F,W}. 
Applying the Lie transformation to H 
H'=e!wH 
= Ho + eLw Ho + Ter Ho Pres 
+eH, +e LwHi +- 
= Hy +e(LwHo + M1) +2 (51%, Ho zs Lw Hh) inks (66) 
The first order term in e€ is zero if W satisfies the condition 
LwHo+ Hı =0, (6.7) 


which is a linear partial differential equation for W. The trans- 
formed Hamiltonian is 


1 
Halae (5 Liv Ho af Lw Hh) ape 
1 
ips by Tick (6.8) 
where we have used condition (6.7) to simplify the e? contribution. 
This basic step of perturbation theory has eliminated terms of 


a certain order (order €) from the Hamiltonian, but in doing so 
has generated new terms of higher order (here e? and higher). 
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At this point we can find an approximate solution by trun- 
cating Hamiltonian (6.8) to Ho, which is solvable. The approxi- 
mate solution for given initial conditions (to, go, po) is obtained by 
finding the corresponding (to, q@, pp) using the inverse of transfor- 
mation (6.4). Then the system is evolved using the solutions of 
the truncated Hamiltonian Hp to time t giving the state (t, q’, p"). 
The phase space coordinates of the evolved point are transformed 
back to the original variables using the transformation (6.4) to 
state (t,q,p). The approximate solution is 


(t, q, p) = (Ew Ett HE ew!) (to, qo, Po) 
= (e™W elf) eT) (to, qo, po). (6.9) 


If the Lie transform E; w = ew must be evaluated by summing 
the series then we must specify the order to which the sum extends. 

Assuming everything goes ok, we can imagine repeating this 
process to eliminate the order e? terms and so on, bringing the 
transformed Hamiltonian as close as we like to Ho. Unfortunately, 
there are complications. We can understand some of these com- 
plications and how to deal with them by considering some specific 
applications. 


6.2 Pendulum as a Perturbed Rotor 


The pendulum is a simple one-degree of freedom system, for which 
the solutions are known. If we consider the pendulum as a free 
rotor with the added complication of gravity, then we can carry 
out a perturbation step as just described to see how well it ap- 
proximates the known motion of the pendulum. 

The motion of a pendulum is described by the Hamiltonian 


2 
H(t,0,p) = &— — €Bcos(6), (6.10) 
2a 
with coordinate 0 and conjugate angular momentum p, and where 
a = ml? and 6 = mgl. The parameter € allows us to scale the per- 
turbation; it is 1 for the actual pendulum. We divide the Hamil- 
tonian into the free rotor Hamiltonian and the perturbation from 
gravity: 


H=Hj+e«M, (6.11) 
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where 
2 
P 
Ho(t,0, p) = 20 
€H,(t,0,p) = —€Bcos 0. (6.12) 


The Lie generator W satisfies condition (6.7): 


{Ho, W} + Hi =0, (6.13) 
or 
-2a w(t, 0, p) — B cos 0 = 0. (6.14) 
So 
mane -2 (6.15) 


where the arbitrary integration constant is ignored. 

The transformed Hamiltonian is H’ = Ho + o(e°). If we can 
ignore the e? contributions, then the transformed Hamiltonian is 
simply 


1\2 
H' / / = (p’) 6.16 
(f= 25, (6.16) 
with solutions 
/ 
6 =O, + Pt- to) 
P = po- (6.17) 


To connect these solutions to the solutions of the original prob- 
lem we use the Lie series 


0 = (e"Q)(t,6,y’) 
=0+d4Q, W(t, 0p) + 
= 0' + cOoW(t, 6’, p) +- 

aß sin 6! 


ot 


(6.18) 
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Similarly, 


/ 
pag ere (6.19) 
P 
Note that if the Lie series is truncated it is not exactly a canonical 
transformation; only the infinite series is canonical. 
The initial values 05 and pp are determined from the initial 
values of 0 and p by the inverse Lie transformation: 


0' = (ewQ) (t, 0, p) 


aß sin 6 
ps gee iad mah 6.20 
P eeu 
and 
p=? Janos’ ae (6.21) 
P 


Note that if we truncate the coordinate transformations after the 
first order terms in e (or any finite order) then the inverse trans- 
formation is not exactly the inverse of the transformation. 

The approximate solution for given initial conditions to, 0, po) 
is obtained by finding the corresponding (to, 0b, po) using the 
transformation (6.20) and (6.21). Then the system is evolved 
using the solutions (6.17). The phase space coordinates of the 
evolved point are transformed back to the original variables using 
the transformation (6.18) and (6.19). 

We define the two parts of the pendulum Hamiltonian: 


(define ((HO alpha) state) 
(let ((ptheta (momentum state) )) 
(/ (square ptheta) (* 2 alpha)))) 


(define ((H1 beta) state) 
(let ((theta (coordinate state))) 
(* -1 beta (cos theta)))) 


The Hamiltonian for the pendulum can be expressed as a series 
expansion in the parameter € by 


(define (H-pendulum-series alpha beta epsilon) 
(series (HO alpha) (* epsilon (H1 beta)))) 
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where the series procedure is a constructor for a series whose first 
terms are given and all further terms are zero. The Lie generator 
that eliminates the order e terms is 


(define ((W alpha beta) state) 
(let ((theta (coordinate state) ) 
(ptheta (momentum state))) 
(/ (* -1 alpha beta (sin theta)) ptheta))) 


We check that W satisfies condition (6.7):! 


(print-expression 
((+ ((Lie-derivative (W ’alpha ’beta)) (HO ’alpha)) 
(H1 ’beta)) 
a-state) ) 
0 


and that it has the desired effect on the Hamiltonian 


(show-expression 
(series:sum 
(CCexp (* ’epsilon (Lie-derivative (W ’alpha ’beta)))) 
(H-pendulum-series ’alpha ’beta ’epsilon)) 
a-state) 


2)) 


ipo 50,876? (sin (0))? 


2 
a Po 


Indeed, the order € term has been removed, and an order e? term 
has been introduced. 
Ignoring the e? terms in the new Hamiltonian the solution is 


(define (((solutionO alpha beta) t) state0) 
(let ((tO (time state0)) 
(theta0 (coordinate state0)) 
(pthetaO (momentum state0))) 
(up t 
(+ thetad (/ (* (- t t0) pthetaO) alpha)) 
ptheta0) )) 


"We use the typical pendulum state 


(define a-state (up ’t ’theta ’p_theta)) 
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The transformation from primed to unprimed phase-space co- 
ordinates is, including terms up to order, 


(define ((C alpha beta epsilon order) state) 
(series: sum 
(((Lie-transform (W alpha beta) epsilon) 
identity) 
state) 
order) ) 


To second order in € the transformation generated by W is 


(show-expression ((C ’alpha ’beta ’epsilon 2) a-state)) 


t 
1a? 02e se (8) sin (8) ef ons (8) xo 
Po Po 
la? ße  aßecos(0 
2 B (8) + po 
Po Po 


The inverse transformation is 


(define (C-inv alpha beta epsilon order) 
(C alpha beta (- epsilon) order)) 


Using these components the perturbative solution (equation 
6.9) is 


(define (((solution epsilon order) alpha beta) delta-t) 
(compose (C alpha beta epsilon order) 
((solution0 alpha beta) delta-t) 
(C-inv alpha beta epsilon order))) 


The resulting procedure maps an initial state to the solution state 
advanced by delta-t. 

We can examine the behavior of the perturbative solution and 
compare it to the true behavior of the pendulum. There are several 
considerations. We have truncated the Lie series for the phase- 
space transformation. Does the missing part matter? If the miss- 
ing part does not matter, how well does this perturbation step 
work? 

Figure 6.1 shows that as we increase the number of terms in the 
Lie series for the phase-space coordinate transformation the result 
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appears to converge. The lone trajectory only includes terms of 
first order. The others, including terms of second, third, and 
fourth order, are closely clustered. On the left edge of the graph 
(at 0 = —7) the order of the solution increases from the top to the 
bottom of the graph. In the middle (at 6 = 0) the fourth-order 
curve is between the second order curve and the third order curve. 
In addition to the error in phase-space path, there is also an error 
in the period—the higher-order orbits have longer periods than 
the first-order orbit. The parameters are a = 1.0 and @ = 0.1. 
We have set € = 1. Each trajectory was started at 0 = 0 with 
pọ = 0.7. Notice that the initial point on the solution varies 
between trajectories. This is because the transformation is not 
perfectly inverted by the truncated Lie series. 

Figure 6.2 compares the perturbative solution (with terms up 
to fourth order) with the actual trajectory of the pendulum. The 
initial points coincide, to the precision of the graph, because the 
terms to fourth order are sufficient. The trajectories deviate both 
in the phase plane and in the period, but they are still quite close. 

The trajectories of figures 6.1 and 6.2 are all for the same initial 
state. As we vary the initial state we find that for trajectories 
that are in the circulation region, far from the separatrix, the 
perturbative solution does quite well. However, if we get close to 
the separatrix, or if we enter the oscillation region the perturbative 
solution is nothing like the real solution, and it does not even 
seem to converge. Figure 6.3 shows what happens when we try to 
use the perturbative solution inside the oscillation region. Each 
trajectory was started at 0 = 0 with pg = 0.55. The parameters 
area=1.0 and 8 = 0.1. 

This failure of the perturbation solution should not be surpris- 
ing. We assumed that the real motion was a distorted version 
of the motion of the free rotor. But in the oscillation region the 
assumption is not true—the pendulum is not rotating at all. The 
perturbative solutions can only be valid (if they work at all!) in 
a region where the topology of the real orbits is the same as the 
topology of the perturbative solutions. 

We can make a crude estimate the range of validity of the per- 
turbative solution by looking at the first correction term in the 
phase-space transformation (6.18). The correction in @ is propor- 
tional to eaß/(p')?. This is not a small perturbation if 


lp'| < yeap. (6.22) 
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Figure 6.1 The perturbative solution in the phase plane, including 
terms of first, second, third, and fourth order in the phase-space coordi- 
nate transformation. The solutions appear to converge. 


Figure 6.2 The perturbative solution in the phase plane, including 
terms of fourth order in the phase-space coordinate transformation, is 
compared with the actual trajectory. The actual trajectory is the lower 
of the two curves. The parameters are the same as in figure 6.1. 
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Figure 6.3 The perturbative solution does not converge in the os- 
cillation region. As we include more terms in the Lie series for the 
phase-space transformation the resulting trajectory develops loops near 
the hyperbolic fixed point that increase in size with the order. 


This sets the scale for the validity of the perturbative solution. 

We can compare this scale to the size of the oscillation region 
(see figure 6.4). We can calculate the extent of the region of 
oscillation of the pendulum by considering the separatrix. The 
value of the Hamiltonian on the separatrix is the same as the 
value at the unstable equilibrium: H(t,0 = m, pọ = 0) = Ge. The 
separatrix has maximum momentum Pa” at 0 = 0: 


H(t,0, pọ”) = H(t, 7,0). (6.23) 
Solving for Da the half-width of the region of oscillation, we find 
Dp = 2V ape. (6.24) 


Comparing equations (6.22) and (6.24) we see that the require- 
ment that the terms in the perturbation solution be small excludes 
a region of the phase space with the same scale as the region of 
oscillation of the pendulum. 

What the perturbation theory is doing is deforming the phase 
space coordinate system so that the problem looks like the free- 
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Figure 6.4 The oscillation region of the pendulum is delimited by 
the separatrix. The maximum momentum occurs at the zero-crossing 
of the angle. The energy is conserved, so its value is the same at the 
unstable fixed point and at the point of maximum momentum. At the 
unstable fixed point the energy is entirely potential energy, because the 
momentum is zero. We use this to compute the maximum momentum 
(where the potential energy is zero and all of the energy is kinetic.) 


rotor problem. This deformation is only sensible in the circulating 
case. So, it is not surprising that the perturbation theory fails in 
the oscillation region. What may be surprising is how well the 
perturbation theory works just outside the oscillation region. The 
range of pg in which the perturbation theory is not valid scales 
in the same way as the width of the oscillation region. This need 
not have been the case—the perturbation theory could have failed 
over a wider range. 


Exercise 6.1: Symplectic residual 


Compute the residual in the symplectic test for various orders of trunca- 
tion of the Lie series for transformation (C alpha beta epsilon order). 


6.2.1 Higher Order 


We can improve the perturbative solution by carrying out addi- 
tional perturbation steps. The overall plan is the same as before. 
We perform a Lie transformation with a new generator that elim- 
inates the desired terms from the Hamiltonian. 
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After the first step the Hamiltonian is, to second order in €, 


1\2 2 
H'(t, o'p) T a e o (sin 6’)? +++: 
1\2 2 
E on Er (1 — cos(26")) +-+- 
= Ho(p') + e H(t, 0p’) +++. (6.25) 


Performing a Lie transformation with generator W’, the new 
Hamiltonian is 


H" = eo Lw H' 
= Ho + (Lw. Ho + H2) i (6.26) 


So the condition on W’ that the second order terms are eliminated 
is 


Lw:Ho + Ho = 0. (6.27) 
This is 

p' aß? 
=W (t, 6’, p’) + 1)? (1 a cos(26’)) = 0. (6.28) 


A generator that satisfies this condition is 


a282 a282 

i aye ka TIE sin(20’). (6.29) 
There are two contributions to this generator, one proportional to 
6’ and the other involving a trigonometric function of 6’. 

The phase-space coordinate transformation resulting from this 
Lie transform is found as before. For given initial conditions, we 
first carry out the inverse transformation corresponding to W, 
then that for W’, solve for the evolution of the system using Ho, 
then transform back using W’ and then W. The approximate 
solution is 


(t, 9, p) = (El w Eew Eu to), Ho e w E! ew T) (to, 80, po) 


= (efEw e? Lw elito) Lim e= Lw elw T) (to, 9, po). (6.30) 


W'(t, 6’, p’) = 


448 Chapter 6 Canonical Perturbation Theory 


The solution obtained in this way is compared to the actual evo- 
lution of the pendulum in figure 6.5. Terms in all Lie series up 
to ef are included. The perturbative solution, including this sec- 
ond perturbative step, is much closer to the actual solution in the 
initial segment, but then the two begin to diverge. The time in- 
terval spanned is 10. Over longer times the divergence is actually 
severe, as shown in figure 6.6. The time interval spanned is 100. 
These solutions begin at 0 = 0 with pg = 0.7. The parameters are 
a = 1.0 and 8 = 0.1. 

A problem with the perturbative solution is that there are terms 
in W’ and in the corresponding phase-space coordinate transfor- 
mation that are proportional to 6’, and 6’ grows linearly with time. 
So the solution can only be valid for small times; the interval of 
validity depends on the frequency of the particular trajectory un- 
der investigation and the size of the coefficients multiplying the 
various terms. Such terms in a perturbative representation of the 
solution that are proportional to time are called secular terms. 
They limit the validity of the perturbation theory to small times. 


6.2.2 Eliminating Secular Terms 


There is a simple solution to the problem of secular terms, devel- 
oped by Lindstedt and Poincaré. The goal of each perturbation 
step is to eliminate terms in the Hamiltonian that prevent solu- 
tion. However, the term in H’ that led to the secular term in 
the generator W’ does not actually impede solution. So a better 
procedure is to leave that term in the Hamiltonian and find the 
generator W” that only eliminates the term that is periodic in 6’. 
So W” satisfies 


p 1" roy ap? h 
LW (t,0,p ) iw) cos(26°) = 0. (6.31) 
The generator is 
292 
Mag ph = LP ino. 39 
W(t, 0,p') ror e (6.32) 


After performing a Lie transformation with this generator the new 
Hamiltonian is 
12 2 
(p") i 72 ap 


H" t o" A = i 
(t,0", p) Ja E Ap")? 


pusti (6.33) 
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Figure 6.5 The solution using a second perturbation step, eliminating 
e? terms from the Hamiltonian, is compared to the actual solution. The 
initial agreement is especially good, but the error increases with time. 


0.75 


0.50 


Figure 6.6 The two-step perturbative solution is shown over longer 
time. The actual solution is a closed curve in the phase plane; this 
perturbative solution wanders all over the place and gets worse with 
time. 
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Including terms up to the e€? term, the solution is 


" 2 


p” = po. (6.34) 


We construct the solution for a given initial condition as before 
by composing the transformations, the solution of the modified 
Hamiltonian, and the inverse transformations. The approximate 
solution is 


(t, 0, p) = (E; wEe wr E-to), gr E- e, wr E e wT) (to, 00, po) 
= (etw ef Lw” elt—to) Lur ee Ew ew I) (to, 90, po). (6.35) 
The resulting phase space evolution is shown is figure 6.7. Now 


the perturbative solution is a closed curve in the phase plane and 
is in pretty good agreement with the actual solution. 


0.75 


0.50 


Figure 6.7 The two-step perturbative solution without secular terms 
is compared to the actual solution. The perturbative solution is now a 
closed curve and is very close to the actual solution. 


By modifying the solvable part of the Hamiltonian we are mod- 
ifying the frequency of the solution. The secular terms appeared 
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because we were trying to approximate a solution with one fre- 
quency as a Fourier series with the wrong frequency. As an anal- 
ogy consider 


sin(w + Aw)t = sin wt cos Awt + cos wt sin Awt 
Awt)? 
= sint (1 -£ 2 +) 


+ cos wt (Awt +--+). (6.36) 


The periodic terms are multiplied by terms that are polynomials in 
the time. These polynomials are the initial segment of the power 
series for periodic functions. The infinite series are convergent, 
but if the series are truncated the error is large at large times. 

Continuing the perturbative solution to higher orders is now 
a straightforward repetition of the steps we have carried out so 
far. At each step in the perturbation solution there will be new 
contributions to the solvable part of the Hamiltonian that absorb 
potential secular terms. The contribution is just the angle inde- 
pendent part of the Hamiltonian after the Hamiltonian is written 
as a Fourier series. The constant part of the Fourier series is the 
same as the average of the Hamiltonian over the angle. So at 
each step in the perturbation theory, the average of the perturba- 
tion is included with the solvable part of the Hamiltonian and the 
periodic part is eliminated by a Lie transformation. 


6.3 Many Degrees of Freedom 


Other problems are encountered in applying perturbation theory 
to systems with more than a single degree of freedom. Consider a 
Hamiltonian of the form 


H —= Ho + ef, (6.37) 


where Ho depends only on the momenta and so is solvable. We 
assume that the Hamiltonian has no explicit time dependence. We 
further assume that the coordinates are all angles, and that Hı is 
a multiply periodic function of the coordinates. 

Carrying out a Lie transformation with generator W, the new 
Hamiltonian is 


H' = e” H 
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= Ho + e (Lw Ho + Hi) ++, (6.38) 
as before. The condition that the order e terms are eliminated is 
{Ho,W} + Hı =0, (6.39) 


a linear partial differential equation. By assumption, the Hamil- 
tonian Ho depends only on the momenta. We define 


wo(p) = O2 H(t, 0, p), (6.40) 


the tuple of frequencies of the unperturbed system. The condition 
on W is 


wo (p)O1W (t, 0, p) = Hı(t,0, p). (6.41) 


As H; is a multiply periodic function of the coordinates we can 
write it as a Poisson series 


1(t, 0, p) = 2 Arte) cos(k - 0). (6.42) 


Similarly, we assume W can be written as a Poisson series: 


W(t, 6, p) = 2, Bele) sin(k - 0). (6.43) 


Substituting these into the condition that order € terms are elim- 
inated, we find 


XC Br(p) (wo(p) - ) cos(k - 0) = A) ) cos(k - 8). (6.44) 


The cosines are orthogonal so each term must be individually zero. 
We deduce 


Ax(p) 


Fao (6.45) 


By(p) = 


?In general, we need to include sine terms as well, but the cosine expansion is 
enough for this illustration. 
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and that the required Lie generator is 
W(t, 0, p) Ao r. os (k-@). (6.46) 


There are a couple of problems. First, if Ag is non-zero then the 
expression for Bo involves a division by zero. So the expression 
for Bo is not correct. The problem is that the corresponding 
term in Hı does not involve 9. So the integration for Bo should 
introduce linear terms in 0. But this is the same situation that 
led to the secular terms in the perturbation approximation to the 
pendulum. Having learned our lesson there we avoid the secular 
terms by adjoining this term to the solvable Hamiltonian, and 
excluding k = 0 from the sum for W. We have 


H’=Ho+eAot:::, (6.47) 
and 
W(t, 0, p) syo T HP) aa (k-@). (6.48) 
coo K volp) 


Another problem is that there are many opportunities for small 
denominators, which would make the perturbation large and there- 
fore not a perturbation. As we saw in the perturbation approx- 
imation for the pendulum in terms of the rotor we must exclude 
certain regions from the domain of applicability of the perturba- 
tion approximation. Consider the phase-space transformation of 
the coordinates 


0 = (e Q) (t, 0", p") 
= 6 + W(t, 6’, p’) 

i DA,(p') — Axg(p')(k - Du(p')) 

i E wP) E 


) sin(k- 0) (6.49) 


So we must exclude from the domain of applicability all regions 
for which the coefficients are large. If the second term dominates, 
the excluded regions satisfy 


|(k - Dw(p')) Ax(p)| > (k - wo(p))?. (6.50) 
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Considering the fact that for any tuple of frequencies wo(p’) we 
can find a tuple of integers k such that k-w(p’) is arbitrarily small 
this problem of small divisors looks very serious. 

However, the problem, though serious, is not as bad as it may 
appear, for a couple of reasons. First, it may be that A, 4 0 only 
for certain k. In this case, the regions excluded from the domain 
of applicability are limited just to those for these terms. Second, 
for analytic functions the magnitude of A; decreases strongly with 
the size of k (see [4]) 


|An(p’)| < 0e, (6.51) 


for some positive G and C, and where |k|4 = |ko| + [Ai] +--+. At 
any stage of a perturbation approximation we can limit considera- 
tion to just those terms that are larger than a specified magnitude. 
The excluded regions corresponding to these terms decreases ex- 
ponentially with order, with size of order square root of |A;(p’)| 
in the inequality (6.51). 


6.3.1 Driven Pendulum as a Perturbed Rotor 


More concretely, consider the periodically driven pendulum. We 
will develop approximate solutions for the driven pendulum as a 
perturbed rotor. 

We use the Hamiltonian 


H(t,0, p) = ml(g — Aw? cos(wt)) cos 8. (6.52) 


2ml? 


We can remove the explicit time dependence by going to the ex- 
tended phase space. The Hamiltonian is 


H(7;0,t; p, T) 
2 
—T7 T ml(g — Aw” cos(wt)) cos 0 
2 
=T+ sa — B cos(0) + ycos(6 — wt) + y cos(0 + wt), (6.53) 


with the constants a = ml?, 3 = mlg, and y = 5mlAw?. 
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With the intent to approximate the driven pendulum as a per- 
turbed rotor we choose 


p° 
Ho(T;0,t;p, T) =T + T 


H: (r7;0,t; p, T) = —8 cosb + ycos(0 + wt) + ycos(8 — wt). (6.54) 


Notice that the perturbation Hı has only three terms in its Pois- 
son series, so in the first perturbation step there will only be three 
regions excluded from the domain of applicability. The perturba- 
tion H; is particularly simple: it has only three terms, and the 
coefficients are constants. 

The Lie series generator that eliminates the terms in H; to first 
order in €, satisfying 


{Ho,W} + Hi =0, (6.55) 
is 
W(r;0,t;p, T) =— b sin ð 
wr(p) 
aY, : 
0+ wt 
+ A eee + wt) 
SY. é 
+ sin(@ — wt), 6.56 
sino - wt) (6.56) 


where w;(p) = 02,9Ho(7; 0, t; p, T) = p/a is the unperturbed rotor 
frequency. 

The resulting approximate solution has three regions in which 
there are small denominators, and so three regions that are ex- 
cluded from applicability of the perturbative solution. Regions of 
phase space for which w,(p) is near 0, w, and —w are excluded. 
Away from these regions the perturbative solution works well, 
just as in the rotor approximation for the pendulum. Unfortu- 
nately, some of the more interesting regions of the phase space of 
the driven pendulum are excluded: the region in which we find 
the remnant of the undriven pendulum is excluded, as are the 
two resonance regions in which the rotation of the pendulum is 
synchronous with the drive. We need to develop methods for ap- 
proximating these regions. 
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6.4 Nonlinear Resonance 


We can develop an approximation for an isolated resonance region 
as follows. 
We again consider Hamiltonians of the form 


H = Ho + eM, (6.57) 


where Ho(t,q, p) = Ho(p) depends only on the momenta and so 
is solvable. We assume that the Hamiltonian has no explicit time 
dependence. We further assume that the coordinates are all an- 
gles, and that Hı is a multiply periodic function of the coordinates 
that can be written 


1(t, O, p) = 2 Arle) cos(k - 0). (6.58) 


Suppose we are interested in a region of phase space for which 
n-wo(p) is near zero, where n is a tuple of integers, one for each 
degree of freedom. If we developed the perturbation theory as be- 
fore with the generator W that eliminates all terms of order e then 
the transformed Hamiltonian is Ho, which is analytically solvable, 
but there would be terms with n-wo(p) in the denominator. The 
resulting solution is not applicable near this resonance. 

Just as the problem of secular terms was solved by grouping 
more terms with the solvable part of the Hamiltonian, we can 
develop approximations that are valid in the resonance region by 
eliminating fewer terms, and grouping more terms in the solvable 
part. 

To develop a perturbative approximation in the resonance re- 
gion for which n-wo(p) is near zero we take the generator W to 
be 


_Ar(P) 
n(t, 0, p) = AN 0), (6.59) 
k#0,ken re) wo(p) 


excluding terms in W that lead to small denominators in this 
region. The transformed Hamiltonian is 


H) (t,0, p) = Ho(p) + €Ao(p) + €An(p) cos(n - 0) +--+, (6.60) 
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where the additional terms are higher order in e. By excluding the 
term k = n from the sum in the generating function, that term is 
left after the transformation. 

The transformed Hamiltonian depends only on a single combi- 
nation of angles, so a change of variables can be made so that the 
new transformed Hamiltonian is cyclic in all but one coordinate, 
which is this combination of angles. This transformed Hamilto- 
nian is solvable (reducible to quadratures). 

For example, suppose there are two degrees of freedom 0 = 
(81,02) and we are interested in a region of phase space in which 
n+ wọ is near zero, with n = (n1,n2). The combination of angles 
n-@ is slowly varying in the resonance region. The transformed 
Hamiltonian (6.60) is of the form 


H! (t; 01, 02; p1, p2) = Ho(p1, p2) + €Ao(p1, p2) 
+ cAn (p1, p2) cos(n101 + n202). (6.61) 


We can transform variables to o = n101 + n202, with second coor- 
dinate, say, 0’ = 02.3 Using the Fo-type generating function 


F(t; 01, 02; 5, 0’) = (n101 +262) + 0,0’. (6.62) 


The transformation is 


pi=my 
p2 =n + O 
o = n101 + n202 
0! = Oy. (6.63) 


In these variables the transformed resonance Hamiltonian H’, be- 
comes 


H! (t; 0,0; £, O) = Ho(nyd, ned + O’) + €Ag(n1d, ngd + 0’) 
+ €An(n1X, n2d + O’) cos(c). (6.64) 


This Hamiltonian is cyclic in 6’, so O’ is constant. With this con- 
stant momentum, the Hamiltonian for the conjugate pair (ø, X) 
has one degree of freedom. The solutions are level curves of the 
Hamiltonian. These solutions can be reexpressed in terms of the 


3 Any linearly independent combination will be acceptable here. 
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original phase space coordinates, and give the evolution of H/.. 
An approximate solution in the resonance region is therefore 


(t; 0, p) = (El w, Ett. E-e w: 1) (to, Ao, Po) (6.65) 


If the resonance regions are sufficiently separated, then a global 
solution can be constructed by splicing together such solutions for 
each resonance region. 


6.4.1 Pendulum Approximation 


The resonance Hamiltonian (6.64) has a single degree of freedom 
and is therefore solvable (reducible to quadratures). We can de- 
velop an approximate analytic solution by making use of the fact 
that the solution is only valid in the immediate vicinity of the 
resonance. The resonance Hamiltonian can be approximated by a 
generalized pendulum Hamiltonian. 

Let 


7 (t; 0,0; 2,9") = Ho(m¥, n2U+O’)+eAo(n1D, n2=+6') (6.66) 
and 

pas 0,0’; 5,0") = An(n X, n2¥ + O’) cos(c). (6.67) 
The resonance Hamiltonian is 
H” = Hz o + eH 4. (6.68) 


Define the resonance center ©, by the requirement that the 
resonance frequency is zero 


2,0 Hn o(t; 0, 0; Sn, O’) = 0. (6.69) 


Now expand both parts of the resonance about the resonance cen- 
ter: 


not; 0, 0; X, 0") = no(t; 0, 0; Xn, o’) 
+ 2,0 Hp o(t 9, 0; En, 8’) (© — En) 
1 
+ 5220 not; T, 0’; Xn, 0’) (a ~ a 
Hay (6.70) 
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and 
" (t;o,0; E, O) = H” (t;o, 0; En, O) +. 6.71) 
n,1 >, 


The first term in the expansion of Hj’) is a constant and can 
be ignored. The coefficient of the second term is zero, from the 
definition of Xn. The third term is the first significant term. We 
presume here that the first term of Any is a non-zero constant. 
Now the scale of the separatrix in ©) at resonance is typically 
proportional to ye. So the third term of H} o and the first term of 
H” | are both proportional to e. Subsequent terms are higher order 
in €. Keeping only the order € terms the approximate resonance 


Hamiltonian is of the form 


S= 


Sane 3! coso, (6.72) 


which the Hamiltonian for a pendulum with a shifted center in 
momentum. This is analytically solvable. 


Driven pendulum resonances 
Consider the behavior of the periodically driven pendulum in the 
vicinity of the resonance w,(p) = w. 

The Hamiltonian (6.54) for the driven pendulum has three reso- 
nance terms in Hı. The full generator (6.56) has three terms that 
are designed to eliminate the corresponding resonance terms in the 
Hamiltonian. The resulting approximate solution has small de- 
nominators near each of the three resonances w (p) = 0, wr (p) = w, 
Wr (p) = —w. 

To develop a resonance approximation near w,(p) = w, we do 
not include the corresponding term in the generator, so that the 
corresponding term is left in the Hamiltonian. It is helpful to give 
names to the various terms in the full generator (6.56): 


W°(7; 0,t; p, T) = ae sin 0 
w 


W-(r:0,t:p,T) = i 
Wt (r:0,t:p,T) = —_— sin(0 — wt), 6.73 
(7:8,t:p.T) = g sin(0— wt) (6.73) 


The full generator is W? + W7 + W+. 
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To investigate the motion in the phase space near the resonance 
w,(p) = w (the ”+” resonance) we use the generator that excludes 
the corresponding term 


W,=W°+W-. (6.74) 


Using this generator the transformed Hamiltonian is 


2 
H, (7;0,t; p, T) =T +5 +ycos(0- wt) ++. (6.75) 


Excluding the higher order terms, this Hamiltonian has only 
a single combination of coordinates, and so can be transformed 
into a Hamiltonian that is cyclic in all but one degree of freedom. 
Define the transformation through the mixed variable generating 
function 


F(T; t,0; D, T^) = (0 — wt)E + tT’, (6.76) 


giving the transformation 


o=0-—wt 

t=ť 

p=d 

PSS. (6.77) 


Expressed in these new coordinates the resonance Hamiltonian is 


y2 
H,'(7;0,t;5,T') =T' —wS+ zg 1 10089 
a 


(= — aw)? firs 8 

= Da + ycoso +T z (6.78) 
This Hamiltonian is cyclic in t’, so the solutions are level curves 
of HY in (o, X). Actually more can be said here because H+’ 
is already of the form of a pendulum shifted in the X direction 
by aw, and shifted by m in phase. The shift by m comes about 
because the sign of the cosine term is positive rather than negative 
as in the usual pendulum. A sketch of the level curves is given in 
figure 6.8. 


Exercise 6.2: Resonance width 
Verify that the half width of the resonance region is 2,/a7ye. 
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Figure 6.8 Contours of the resonance Hamiltonian H+’ give the mo- 
tion in the (o, X£) plane. In this case the resonance Hamiltonian is a 
generalized pendulum shifted in momentum and phase. The half-width 
of the resonance oscillation zone is 2\/aye. 


Exercise 6.3: With the computer 


Verify, with the computer, that with the generator W, the transformed 
Hamiltonian is given by equation (6.75). 


An approximate solution of the driven pendulum near the 
w,(p) = w resonance is 


(T; 0, t; p, T) = ( lw, Er-rn, g, E ew, T) (To; 90, to; Po, To). (6.79) 


To find out to what extent the approximate solution models the 
actual driven pendulum we make a surface of section using this 
approximate solution and compare it to a surface of section for 
the actual driven pendulum. The surface of section for the ap- 
proximate solution in the resonance region is shown in figure 6.9. 
A surface of section for the actual driven pendulum is shown in 
figure 6.10. The correspondence is surprisingly good, but some 
features of the actual section are not represented. For instance, 
there is a small chaotic zone near the actual separatrix. Note how 
the resonance island is not symmetrical about a line of constant 
momentum. The resonance Hamiltonian is symmetrical about 
“= aw, so, taken by itself, would give a symmetric resonance 
island. The necessary distortion is introduced by the WT trans- 
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formation that eliminates the other resonances. Indeed, in the 
full section the distortion appears to be generated by the nearby 
wr(p) = 0 resonance “pushing away” nearby features so that it 
has room to fit. 


10 


-10 


Figure 6.9 Surface of section of the first-order perturbative solution 
for the driven pendulum constructed for the region near the resonance 
w,(p) = w. The parameters of the system are: a = 1, 8 = 1, y = 1/4, 
and w = 5. Only order e terms were kept in the Lie series for the W 
transformation. The perturbative solution captures the essential shape 
and position of the resonant island it is designed to approximate. 


The perturbation solution near the w,(p) = 0 resonance merges 
smoothly with the perturbation solutions for the w,(p) = w and 
w,(p) = —w resonances. We can make a composite perturba- 
tive solution by using the appropriate resonance solution for each 
region of phase space. A surface of section for the composite 
perturbative solution is shown in figure 6.10. The corresponding 
surface of section for the actual driven pendulum is also shown. 
The perturbative solution captures many features seen on the ac- 
tual section. However, the first-order perturbative solution does 
not capture the resonant islands between the two primary reso- 
nances or the secondary island chains contained within a primary 
resonance region. The first-order perturbative solution does not 
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Figure 6.10 <A composite surface of section for the driven pendulum 
is constructed by combining the first-order perturbative solution for the 
region near the resonance w,(p) = 0 and the solutions for the regions 
near the resonances w,(p) = +w. A corresponding surface of section 
for the actual driven pendulum is shown below. The parameters of the 
system are: a= 1, 8 = 1, y= 1/4, and w = 5. 
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show the chaotic zone near the separatrix apparent in the surface 
of section for the actual driven pendulum. 

We see, from the comparisons of the sections of the first-order 
perturbative solutions for the various resonance regions that the 
section for the actual driven pendulum can be approximately con- 
structed by combining the approximations developed for each res- 
onance. The shapes of the resonance regions are distorted by 
the transformations that eliminate the nearby resonances, so the 
resulting pieces fit together consistently. The predicted width of 
each resonance region agrees with the actual width: it was not sub- 
stantially changed by the distortion of the region introduced by 
the elimination of the other resonance terms. Not all the features 
of the actual section are reproduced in this composite of first-order 
approximations: there are chaotic zones and islands that are not 
accounted for in this collage of first-order approximations. 

For larger drives the approximations derived by first-order per- 
turbations are worse. In figure 6.11, with a factor of five larger 
drive we lose the invariant curves that separate the resonance re- 
gions. The main resonance islands persist, but the chaotic zones 
near the separatrices have merged into one large chaotic sea. 

The first-order perturbative solution for the more strongly driven 
pendulum in figure 6.11 still approximates the centers of the main 
resonance islands reasonably well, but it fails as we move out and 
encounter the secondary islands that are visible in the resonance 
region for w,(p) = w. Here the approximations for the two regions 
do not fit together so well. The chaotic sea is found in the region 
where the perturbative solutions do not match. 


6.4.2 Reading the Hamiltonian 


The locations and widths of the primary resonance islands can 
often be read straight off the Hamiltonian, when expressed as a 
Poisson series. For each term in the series for the perturbation 
there is a corresponding resonance island. The width of the island 
can often be simply computed from the coefficients in the Hamil- 
tonian. So just by looking at the Hamiltonian we can get a good 
idea of what sort of behavior we will see on the surface of section. 
So, for instance, in the driven pendulum, Hamiltonian (6.53) has 
three terms. We could anticipate, just from looking at the Hamil- 
tonian, that there are three main resonance islands to be found on 
the surface of section. We know that these islands will be located 
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Figure 6.11 Composite surface of section for the driven pendulum 
constructed by combining the first-order perturbative solution for the 
region near the resonance w,(p) = 0 and the regions near the resonances 
w,(p) = tw. A corresponding surface of section for the actual driven 
pendulum is shown below. The parameters of the system are the same 
as in figure 6.10 except that y = 5/4. 
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where the resonant combination of angles is slow. So for the pe- 
riodically driven pendulum the resonances occur near w;(p) = w, 
wr(p) = 0, and wr(p) = —w. The approximate widths of the 
resonance islands can be computed with a simple calculation. 


6.4.3 Resonance Overlap Criterion 


As the size of the drive increases the chaotic zones near the sep- 
aratrices get larger and then merge into a large chaotic sea. The 
resonance overlap criterion gives an analytic estimate of when this 
occurs. The basic idea is to compare the sum of the widths of 
neighboring resonances with their separation. If the sum of the 
half-widths is greater than the separation then the resonance over- 
lap criterion predicts there will be large scale chaotic behavior near 
the overlapping resonances. In the case of the periodically driven 
pendulum the half-width of the w,(p) = 0 resonance is 2\/af, 
and the half-width of the w,(p) = w resonance is 2,/ay (see fig- 
ure 6.12). The separation of the resonances is aw. So resonance 
overlap occurs if 


2/aB + 2/ay > aw. (6.80) 


The amplitude of the drive enters through y. Solving, we find 
the value of y above which resonance overlap occurs. For the 
parameters a = @ = 1, w = 5 used in the above figures, the 
resonance overlap value of y is 9/4. We see that, in fact, the 
chaotic zones have already merged for y = 5/4. So in this case 
the resonance overlap criterion overestimates the strength of the 
resonances that are required to get large scale chaotic behavior. 
This is typical of the resonance overlap criterion. 

A way of thinking about why the resonance overlap criterion 
usually overestimates the strength required to get large scale chaos 
is that there are other effects that need to be taken into account. 
For instance, as the drive is increased second order resonances 
appear between the primary resonances; these resonances take 
up space and so resonance overlap occurs for smaller drive than 
would be expected by considering the primary resonances alone. 
Also the chaotic zones at each separatrix have some width also 
take up area that must be taken into account. 
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Figure 6.12 Resonance overlap occurs when the sum of the half- 
widths of adjacent resonances is larger than the spacing between them. 


6.4.4 Resonances in Higher Order Perturbation Theory 


As the drive is increased, a variety of new islands emerge, which 
are not evident in the original Hamiltonian. To find approxima- 
tions for motion in these regions we can use higher order pertur- 
bation theory. The basic plan is the same as before. At any stage 
the Hamiltonian (which is perhaps a result of earlier stages of 
perturbation theory) is expressed as a Poisson series (a multiple 
angle Fourier series). The terms that are not resonant in a region 
of interest are eliminated by a Lie transformation. The remaining 
resonance terms involve only a single combination of angle and is 
thus solvable by making a canonical transformation to resonance 
coordinates. We complete the solution and transform back to the 
original coordinates. 

Let’s find a perturbative approximation for the second order 
islands visible in figure 6.10 between the wr(p) = 0 resonance and 
the wr(p) = —w resonance. The details are messy, so we will just 
give a few intermediate results. 

This resonance is not near the three primary resonances, so we 
can use the full generator (6.56) to eliminate those three primary 
resonance terms from the Hamiltonian. After this perturbation 
step the Hamiltonian is too hairy to look at. 
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We expand the transformed Hamiltonian in Poisson form and 
divide the terms into those that are resonant and those that are 
not. The terms that are not resonant can be eliminated by a 
Lie transform. This Lie transform leaves the resonant terms in 
the Hamiltonian and introduces an additional distortion to the 
curves on the surface of section. This latter distortion is small in 
this case, and very messy to compute, so we will just not include 
this effect. The resonance Hamiltonian is then (after considerable 
algebra) 


1 (7; 0, t; p, T) 
p aby aw? + 2awp + 2p? 


z T4 
2a 4 4p? (aw + p)? 


cos (20 + wt) (6.81) 


This is solvable because there is only a single combination of co- 
ordinates. 

We can get an analytic solution by making the pendulum ap- 
proximation. The Hamiltonian is already quadratic in the mo- 
mentum p, so all we need to do is evaluate the coefficient of the 
potential terms at the resonance center p2.1 = aw/2. The reso- 
nance Hamiltonian, in the pendulum approximation, is 

2 
He to T) = sn + z cos (20 + wt). (6.82) 
Carrying out the transformation to the resonance variable o = 
20 — wt reduces this to a pendulum Hamiltonian with a single 
degree of freedom. Combining the analytic solution of this pen- 
dulum Hamiltonian, with the transformations generated by the 
full W, we get an approximate perturbative solution 


(7;0,t; p, T) = (El wE,- Hy, E-e wT) (To; 90; to; po, To). (6.83) 


A surface of section in the appropriate resonance region using this 
solution is shown in figure 6.13. Comparing this to the actual sur- 
face of section (figure 6.10) we see that the approximate solution 
provides a good representation of this resonance motion. 


6.4.5 Stability of Inverted Vertical Equilibrium 


As a second application, we use second order perturbation theory 
to investigate the inverted vertical equilibrium of the periodically 
driven pendulum. 
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Figure 6.13 Second order perturbation theory gives an approxima- 
tion to the second order islands near the resonance 2w,(p) +w = 0. 


Actually the procedure parallels that just followed, but here we 
focus on a different set of resonance terms. The terms that are 
slowly varying for the vertical equilibrium are those that involve 
6 but do not involve t such as cos(@) and cos(20). So we want 
to use the generator W+ + W~ that eliminates the non-resonant 
terms involving combinations of 0 and wt, while leaving the central 
resonance. After the Lie transform of the Hamiltonian with this 
generator, we write the transformed Hamiltonian as a Poisson 
series and collect the resonant terms. The transformed resonance 
Hamiltonian is 


Hy (T; 0,t; p, T) 


2 a aE 
w 
—? — Becosé 4 er (esa Ds) 
2a 2(a2w? — p?)? 


cos(20) +---. (6.84) 


Figure 6.14 shows contours of this resonance Hamiltonian Hý. 
Figure 6.14 shows a surface of section for the actual driven pen- 
dulum for the same parameters. The behavior of the resonance 
Hamiltonian is indistinguishable from that of the actual driven 
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pendulum. The theory does especially well here; there are no 
nearby resonances because the drive frequency is high. 

We can get an analytic estimate for the stability of the inverted 
vertical equilibrium by carrying out a linear stability analysis of 
the resonance Hamiltonian of the fixed point 0 = 7, p= 0. The 
algebra is somewhat simpler if we first make the pendulum approx- 
imation about the resonance center. The resonance Hamiltonian 
is then approximately 


p re 
Hy (T; 0,t;p, T) = ba GBecos6 + az cos(20) +. (6.85) 
Linear stability analysis of the inverted vertical equilibrium indi- 
cates stability for 


y? > aBw. (6.86) 


In terms of the original physical parameters, the vertical equilib- 
rium is linearly stable if 


2L aD, (6.87) 


where ws = vg/l , the small amplitude oscillation frequency. For 
the vertical equilibrium to be stable the scaled product of the 
amplitude of the drive and the drive frequency must be sufficiently 
large. 

This analytic estimate is compared with the behavior of the 
driven pendulum in figure 6.15. For any given assignment of the 
parameters the driven pendulum can be tested for the linear sta- 
bility of the inverted vertical equilibrium by the methods of chap- 
ter 4: numerically, this involves determining the roots of the char- 
acteristic polynomial for a reference orbit at the resonance center. 
In the figure the stability of the inverted vertical equilibrium was 
assessed at each point of a grid of assignments of the parameters. 
A dot is shown for combinations of parameters that are linearly 
stable. The diagonal line is the analytic boundary of the region of 
stability of the inverted equilibrium: (w/w,)(A/l) = V2. We see 
that the boundary of the region of stability is well approximated 
by the analytic estimate derived from the perturbation theory. 
Note that for very high drive amplitudes there is another region 
of instability, which is not captured by this perturbation analysis. 
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Figure 6.14 Contours of the resonance Hamiltonian H{,, which has 
been developed to study the stability of the vertical equilibrium, are 
shown in the upper plot. A corresponding surface of section for the 
actual driven pendulum is shown in the lower plot. The parameters are 
m = 1 kg, L = 1 m, g = 9.8 m/s?, A = 0.03m, w = 100w,, where 


ws = 4/ g/L 
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Figure 6.15 Stability of the inverted vertical equilibrium over a range 
of parameters. The full parameter space displayed was sampled over a 
regular grid. The dots indicate parameters for which the actual driven 
pendulum is linearly stable; nothing is plotted in the case of instability. 
The diagonal line is the locus of points satisfying: (w/w,)(A/l) = V2. 


6.5 Projects 


Exercise 6.4: Periodically driven pendulum 


a. Work out the details of the perturbation theory for the primary driven 
pendulum resonances, as displayed in figure 6.10. 


b. Work out the details of the perturbation theory for the stability of 
the inverted vertical equilibrium. Derive the resonance Hamiltonian, 
and plot its contours. Compare these contours to surfaces of section for 
a variety of parameters. 


c. Carry out the linear stability analysis leading to equation (6.87). 
What is happening in the upper part of figure fig:dpend-inverted-summary? 
Why is the system unstable when criterion (6.87) predicts stability? Use 
surfaces of section to investigate this parameter regime. 
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Exercise 6.5: Spin-orbit coupling 

A Hamiltonian for the spin-orbit problem, described in section 2.11.2, is 

De neC a 

2C 4 R(t) 
pe mec 

T20 4 


cos2(0 — f(t)) 


(cos(20 — 2nt) 4 - cos(20 — 3nt) 


5 cos(20 nt)+---) (6.88) 


where the ignored terms are higher order in eccentricity e. 


a. Find the widths and centers of the three primary resonances. Com- 
pare the predictions for the widths to the island widths seen on surfaces 
of section. Write the criterion for resonance overlap and compare to 
numerical experiments for the transition to large-scale chaos. 


b. The fixed point of the synchronous island is offset from the average 
rate of rotation. This is indicative of a “forced” oscillation of the ro- 
tation of the Moon. Develop a perturbative theory for motion in the 
synchronous island by using a Lie transform to eliminate the two non- 
synchronous resonances. Predict the location of the fixed point at the 
center of the synchronous resonance on the surface of section, and thus 
predict the amplitude of the forced oscillation of the Moon. 


T 


Appendix: Our Notation 


An adequate notation should be understood by at 
least two people, one of whom may be the author. 


Abdus Salam, (1950). 


We adopt a functional mathematical notation that is close to that 
used by Spivak in his Calculus on Manifolds. The use of func- 
tional notation avoids many of the ambiguities of traditional math- 
ematical notation; the ambiguities of traditional notation can be 
an impediment to clear reasoning in classical mechanics. Func- 
tional notation carefully distinguishes the function from the value 
of the function when applied to particular arguments. In func- 
tional notation mathematical expressions are unambiguous and 
self-contained. 

We adopt a generic arithmetic in which the basic arithmetic 
operations, such as addition and multiplication, are extended to 
a wide variety of mathematical types. Thus, for example, the ad- 
dition operator + can be applied to numbers, tuples of numbers, 
matrices, functions, etc. Generic arithmetic formalizes the com- 
mon informal practice that is used to manipulate mathematical 
objects. 

We often want to manipulate aggregate quantities, such as the 
collection of all of the rectangular coordinates of a collection of 
particles, without explicitly manipulating the component parts. 
Tensor arithmetic provides a traditional way of manipulating ag- 
gregate objects: Indices label the parts, and conventions, such as 
the summation convention, are introduced to manipulate the in- 
dices. We introduce a tuple arithmetic as an alternative way of 
manipulating aggregate quantities that usually allows us to avoid 
labelling the parts with indices. Tuple arithmetic is inspired by 
tensor arithmetic, but it is more general: not all of the components 
of a tuple need to be of the same size or type. 

The mathematical notation is in one-to-one correspondence 
with the expressions of the computer language Scheme [21]. Scheme 
is based on the A-calculus [12] and directly supports the manipula- 
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tion of functions. We augment Scheme with symbolic, numerical, 
and generic features to support our applications. For a simple 
introduction to Scheme see Appendix 8. The correspondence be- 
tween the mathematical notation and Scheme requires that math- 
ematical expressions be unambiguous and self-contained. Scheme 
provides immediate feedback in verification of mathematical de- 
ductions, and facilitates the exploration of the behavior of sys- 
tems. 


Functions 

The value of the function f, given the argument z, is written f(z). 
The expression f(x) denotes the value of the function at the given 
argument; when we wish to denote the function we write just f. 
Functions may take several arguments. For example, we may have 
the function that gives the Euclidean distance between two points 
in the plane given by their rectangular coordinates. 


d(x1, 91,22, y2) = y (z2 — 21)? + (yo — y1)?. (7.1) 


In Scheme we can write this: 


(define (d x1 y1 x2 y2) 
(sqrt (+ (square (- x2 x1)) (square (- y2 y1))))) 


Functions may be composed if the range of one overlaps the 
domain of the other. The composition of functions is constructed 
by passing the output of one to the input of the other. We write 
the composition of two functions using the o operation: 


(fog): x> (fog)(x) = f(g(2)). (7.2) 


A procedure h that computes the cube of the sine of its argument 
may be defined by composing the procedures cube and sin: 


(define h (compose cube sin) ) 


(h 2) 
. 7518269446689928 


which is the same as 


(cube (sin 2)) 
-7518269446689928 


411 


Arithmetic is extended to the manipulation of functions: the 
usual mathematical operations may be applied to functions. Ex- 
amples are addition and multiplication; we may add or multiply 
two functions if they take the same kinds of arguments and if their 
values can be added or multiplied: 


(f + 9)(œ) = f(x) + g(a), 
(fg)(z) = f(w)g(a). (7.3) 


A procedure g that multiplies the cube of its argument by the sine 
of its argument is: 


(define g (* cube sin)) 


(g 2) 
7.274379414605454 


(* (cube 2) (sin 2)) 
7.274379414605454 


Symbolic values 

As in usual mathematical notation, arithmetic is extended to al- 
low the use of symbols that represent unknown or incompletely 
specified mathematical objects. These symbols are manipulated 
as if they had values of a known type. By default, a Scheme 
symbol is assumed to represent a real number. So the expression 
?a is a literal Scheme symbol that represents an unspecified real 
number. 


(print-expression 
((compose cube sin) ’a)) 
(expt (sin a) 3) 


The procedure print-expression simplifies the expression, re- 
moves the type tags, and displays it in a readable form. We can 
use the simplifier to verify a trigonometric identity: 


(print-expression 
((- (+ (square sin) (square cos)) 1) ’a)) 
0 


Just as it is useful to be able to manipulate symbolic numbers, 
it is useful to be able to manipulate symbolic functions. The 
procedure literal-function makes a procedure that acts as a 
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function having no properties other than its name. By default, a 
literal function is defined to take one real argument and produce 
one real value. For example, we may want to work with a function 


f:R-R. 


(print-expression 
((literal-function ’f) ’x)) 
(g x) 


(print-expression 
((compose (literal-function ’f) (literal-function ’g)) ’x)) 
(£ (g x)) 


We can also make literal functions of multiple, possibly struc- 
tured arguments that return structured values. For example, to 
denote a literal function named g that takes two real arguments 
and returns a real value (g : R x R > R) we may write: 


(define g (literal-function ’g (-> (X Real Real) Real))) 


(print-expression (g ’x ’y)) 
(g x y) 


We may use such a literal function anywhere that an explicit func- 
tion of the same type may be used. 

There is a whole language for describing types of literal func- 
tions in terms of the types and numbers of their arguments and 
the types of their values. Here we describe a function that maps 
pairs of real numbers to real numbers with the expression: (-> (X 
Real Real) Real). Later we will introduce structured arguments 
and values and we will show the extensions of literal functions to 
handle these. 


Tuples 

There are two kinds of tuples: up tuples and down tuples. We 
write tuples as ordered lists of their components; a tuple is de- 
limited by parentheses if it is an up tuple and it is delimited by 
square brackets if it is a down tuple. For example, the up tuple v 
of velocity components v°, v!, and v? is 


v= (v°,v', v?) ; (7.4) 
The down tuple p of momentum components po, pı, and p2 is 


p = [po, p1, p2] - (7.5) 
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A component of an up tuple is usually identified with a super- 
script. A component of a down tuple is usually identified with 
a subscript. We use zero-based indexing when referring to tuple 
elements. This notation follows the usual convention in tensor 
arithmetic. 

In Scheme we make tuples with the constructors up and down. 


(define v (up ’v70 ’v71 ’v72)) 


(print-expression v) 
(up v°0 v7l v2) 


(define p (down ’p0O ’p1 ’p_2)) 


(print-expression p) 
(down p-0 p_1 p_2) 


Tuple arithmetic is different from the usual tensor arithmetic 
in that the components of a tuple may also be tuples and different 
components need not have the same structure. For example, a 
tuple structure s of phase-space states is 


S= (t, (x, y) , [Px Pyl) : (7.6) 


It is an up tuple of the time, the coordinates, and the momenta. 
The time t has no substructure. The coordinates are an up tuple 
of the coordinate components x and y. The momentum is a down 
tuple of the momentum components py and py. In Scheme: 


(define s (up ’t (up ’x ’y) (down ’p-x ’p_y))) 


In order to reference components of tuple structures there is a 
class of selector functions. For example: 


I(s)=s 
Io(s) St 
L(s) = (x,y) 
Ta(s) = [Pes Py] 
Do(s)=72 


In1(s) = Py. (7.7) 
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The sequence of integer subscripts on the selector describes the 
access chain to the desired component. 

The procedure component is the general selector procedure that 
implements the selector function J,: 


((component 0 1) (up (up ’a ’b) (up ’c ’d))) 
b 


To access a component of a tuple we may also use the selector 
procedure ref, which takes a tuple and an index and returns the 
indicated element of the tuple: 


(ref (up ’a ’b ’c) 1) 
b 


We use zero-based indexing everywhere. The procedure ref can 
be used to access any substructure of a tree of tuples: 


(ref (up (up ’a ’b) (up ’c ’d)) O 1) 
b 


Two up tuples of the same length may be added or subtracted, 
elementwise, to produce an up tuple, if the components are com- 
patible for addition. Similarly, two down tuples of the same length 
may be added or subtracted, elementwise, to produce a down tu- 
ple, if the components are compatible for addition. 

Any tuple may be multiplied by a number, by multiplying each 
component by the number. Numbers may, of course, be mul- 
tiplied. Tuples that are compatible for addition form a vector 
space. 

Two tuples are said to be compatible for contraction if they are 
of opposite types, they are of the same length, and their corre- 
sponding elements are compatible for contraction. If two tuples 
are compatible for contraction then generic multiplication is in- 
terpreted to be contraction: The result is the sum of the products 
of corresponding components of the tuples. For example, p and v 
introduced above are compatible for multiplication; the product 
is 


pu = pov? + piv" + pav’. (7.8) 


So the product of tuples that are compatible for contraction is an 
inner product. Contraction of tuples is commutative: pv = vp. 
Using the tuples p and v defined above 
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(print-expression 
(* p v)) 
(+ (* p-0 v^°0) (* pt v1) (* p-2 v^2)) 


Tuple structures can be made to represent linear transforma- 
tions. For example, the rotation commonly represented by the 
matrix 


eee pod (7.9) 


sinf cos 


can be represented as a tuple structure,! 
cos \ /—sin0 
: ; 1 
leer cos 0 )| eM) 
Such a tuple is compatible for contraction with an up tuple that 
represents a vector. So, for example: 


[eS eer wee Ekin) (7.11) 
sin 0 cos 0 y/  \zxsinĝ +ycosð/ ` ` 

Two tuples that represent linear transformations, though not com- 
patible for contraction, may also be combined by multiplication. 
In this case the product represents the composition of the linear 


transformations. For example, the product of the tuples repre- 
senting two rotations is 


eae kaa) ee) 
cos(6 4 sin(@ 4 
~ eae a ( pone I ' (7.12) 


Multiplication of tuples that represent linear transformations is as- 
sociative but generally not commutative, just as the composition 
of the transformations is associative but not generally commuta- 
tive. 

The actual rule for multiplying two structures that are not com- 
patible for contraction is simple. If A and B are not compatible 
for contraction, the product is a tuple of type B, the compo- 
nents are the products of a and the components of B. The same 


1The arrangement of the components of a tuple structure is not significant, 
as it is in matrix notation: We might just as well have written this tuple as 
[(cos 8, sin 8) , (— sin 8, cos 8)]. 
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rule is applied recursively in multiplying the components. So if 
B = (B°, B', B?), the product of A and B is 


AB = (AB°, AB', AB’). (7.13) 


If A and C are not compatible for contraction and C = [Co, C1, C2], 
the product is 


AC = [AC, AC}, AC] . (7.14) 


Caution: Multiplication of tuples that are compatible for con- 
traction is, in general, not associative. For example, let u = (5,2), 
v = (11,13), and g = [[8,5],[7,9]]. Then u(gv) = 964, but 
(ug)v = 878. The expression ugv is ambiguous. An expression 
that has this ambiguity does not arise in this book. 


Derivatives 
The derivative of a function f is a function. It is denoted by 
Df. Our notational convention is that D is a high-precedence 
operator. Thus D operates on the adjacent function before any 
other application occurs: D f(x) is the same as (Df)(x). Higher- 
order derivatives are described by exponentiating the derivative 
operator. Thus the nth derivative of a function f is notated by 
D"f. 

The Scheme procedure for producing the derivative of a function 
is named D. The derivative of the sin procedure is a procedure that 
computes cos: 


(define derivative-of-sine (D sin)) 


(print-expression (derivative-of-sine ’x)) 
(cos xX) 


The derivative of a function f is the function Df whose value 
for a particular argument is something that can be multiplied by 
an increment Az in the argument to get a linear approximation 
to the increment in the value of f: 


f(x + Ar) = f(x) + Df(a)Az. (7.15) 


For example, let f be the function that cubes its argument 
(f(x) = x), then Df is the function that yields three times 
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the square of its argument (Df(y) = 3y?). So f(5) = 125 and 
Df (5) = 75. The value of f with argument x + Az is 


f(a+ Ac) = (x + Az)? = r? + 8x7Ag + 3rAr? + Ax? (7.16) 
and 
Df (x)Agx = 327Az. (7.17) 


So Df(x) multiplied by Ax gives us the term in f(x + Az) that is 
linear in Az, providing a good approximation to f(a+Az) — f(x) 
when Az is small. 

Derivatives are operators. An operator is like a function ex- 
cept that multiplication of operators is interpreted as composition, 
whereas multiplication of functions is multiplication of the values 
(see equation 7.3). If D were an ordinary function, then the rule 
for multiplication would imply that D? f would just be the prod- 
uct of Df with itself, which is not what is intended. Arithmetic is 
extended to allow manipulation of operators. A typical operator 
is 


(D+1)\(D-1)= D? =, 


which subtracts a function from its second derivative. The 1 acts 
as the identity operator: When arithmetically combined with op- 
erators, a number is treated as an operator that multiplies its 
input by the number. Such an operator can be constructed and 
used in Scheme: 


(print-expression 
(((* (- D 1) (+ D 1)) (literal-function ’f)) ’x)) 
(+ (((expt D 2) f) x) (* -1 (£ x))) 


Derivatives of functions of multiple arguments 

The derivative generalizes to functions that take multiple argu- 
ments. The derivative of a real-valued function of multiple argu- 
ments is an object whose contraction with the tuple of increments 
in the arguments gives a linear approximation to the increment in 
the function’s value. 

A function of multiple arguments can be thought of as a func- 
tion of an up tuple of those arguments. Thus an incremental ar- 
gument tuple is an up tuple of components, one for each argument 
position. Thus the derivative of such a function is a down-tuple 
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of the the partial derivatives of the function with respect to each 
argument position. 

Suppose we have a real-valued function g of two real-valued 
arguments, and we want to approximate the increment in the value 
of g from its value at x,y. If the arguments are incremented by 
the tuple (Az, Ay) we compute: 


Dg(z, y) (Ag, Ay) — [Oog(a, y), ðıg(z, y)] ` (Ar, Ay) 
= g(x, y)Ax + g(x, y) Ay. (7.18) 


Using the two-argument literal function g defined above: 


(print-expression ((D g) ’x ’y)) 
(down (((partial 0) g) x y) (((partial 1) g) x y)) 


In general, partial derivatives are just the components of the 
derivative of a function that takes multiple arguments (or struc- 
tured arguments or both, see below). So a partial derivative of a 
function is a composition of a component selector and the deriva- 
tive of that function. Indeed: 


og = Io 0 Dg 7.19) 
g = l o Dg. (7.20) 


A 


Concretely, if 


glz, y) = zy” (7.21) 
then 
Dg(z, y) = [3x°y", 5x°y*] (7.22) 


and the first-order approximation of the increment for changing 
the arguments by Az and Ay is 


g(a + Az, y + Ay) — g(x,y) ~ [32y?, 5a°y*] - (Aw, Ay) 
= 327y?Ag + 5a? y*Ay. (7.23) 


Mathematical notation usually does not distinguish functions 
of multiple arguments and functions of the tuple of arguments. 
Let h((x,y)) = g(x,y). The function h, which takes a tuple of 
arguments x and y, is not distinguished from the function g that 
takes arguments x and y. We use both ways of defining functions 
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of multiple arguments. The derivatives of both kinds of functions 
are compatible for contraction with a tuple of increments to the 
arguments. Scheme comes in handy here: 


(define (h s) 
(g (ref s 0) (ref s 1))) 


(print-expression 
(h (up ’x ’y))) 
(g x y) 


(print-expression ((D g) ’x ’y)) 
(down (((partial 0) g) x y) (((partial 1) g) x y)) 


(print-expression ((D h) (up ’x ’y))) 
(down (((partial 0) g) x y) (((partial 1) g) x y)) 


A phase-space state function is a function of time, coordinates, 
and momenta. Let H be such a function. The value of H is 
H(t, (x,y), [Px, py]) for time t, coordinates (x,y), and momenta 
[Px, Py]. Let s be the phase-space state tuple as in (7.6): 


s= (t, (x, y) , [Px Pyl) i (7.24) 


The value of H for argument tuple s is H(s). We use both ways 
of writing the value of H. 

We often show the use of a function of multiple arguments that 
include tuples by indicating the boundaries of the argument tuples 
with semicolons and separating their components with commas. 
If H is a function of phase-space states with arguments t, (£, y), 
and [pz, py] we may write H(t; £, Y; Pr, py). This notation loses 
the up/down distinction, but our semicolon-and-comma notation 
is convenient and reasonably unambiguous. 

The derivative of H is a function that produces an object that 
can be contracted with an increment in the argument structure to 
produce an increment in the function’s value. The derivative is a 
down tuple of three partial derivatives. The first partial derivative 
is the partial derivative with respect to the numerical argument. 
The second partial derivative is a down tuple of partial derivatives, 
with respect to each component of the up-tuple argument. The 
third partial derivative is an up tuple of partial derivatives, with 
respect to each component of the down-tuple argument. 


DH (s) = [39H (s), 81 H (s), OoH(s)) (7.25) 
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= [00H(s), [31,0 H (8), 1,1 H (58)] , (02,0H(s), 021H(s))], 


where 1, indicates the partial derivative with respect to the first 
component (index 0) of the second argument (index 1) of the func- 
tion, and so on. Indeed 0,F = I, o DF, for any function F and 
access chain z. So, if we let As be an incremental phase-space 
state tuple, 


As= (At, (Az, Ay) ; [Ape Apy]) ; (7.26) 
then 


DH(s)As = &H (s)At 
a ioH (s)Ax + O11 H(s)Ay 
+ 02,0H(s)Apy + O21 H(s)Apy. (7.27) 


Caution: Partial derivative operators with respect to different 
structured arguments generally do not commute. 

In Scheme we must make explicit choices. We usually assume 
phase space state functions are functions of the tuple. For example 


(define H 
(literal-function ’H 
(-> (UP Real (UP Real Real) (DOWN Real Real)) Real))) 


(print-expression 
(ŒH s)) 
(H (up t (up x y) (down p_x p-y))) 


(print-expression 

((D H) s)) 

(down 

(((partial 0) H) (up t (up x y) (down p_x p_y))) 

(down (((partial 1 0) H) (up t (up x y) (down p_x p_y))) 

(((partial 1 1) H) (up t (up x y) (down p_x p_y)))) 

(up (((partial 2 0) H) (up t (up x y) (down p_x p_y))) 

(((partial 2 1) H) (up t (up x y) (down p-x p_y))))) 


Structured results 
Some functions produce structured outputs. A function whose 
output is a tuple is equivalent to a tuple of component functions 
each of which produces one component of the output tuple. 

For example, a function that takes one numerical argument and 
produces a structure of outputs may be used to describe a curve 
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through space. The following function describes a helical path 
around the z-axis in three-dimensional space: 


h(t) = (cost, sin t, t) = (cos, sin, T) (t). (7.28) 


The derivative is just the up tuple of the derivatives of each com- 
ponent of the function: 


Dh(t) = (— sin t, cost, 1). (7.29) 
In Scheme we can write 


(define (helix t) 
(up (cos t) (sin t) t)) 


or just 
(define helix (up cos sin identity)) 


Its derivative is just the up tuple of the derivatives of each com- 
ponent of the function: 


(print-expression ((D helix) ’t)) 
(up (* -1 (sin t)) (cos t) 1) 


In general, a function that produces structured outputs is just 
treated as a structure of functions, one for each of the components. 
The derivative of a function of structured inputs that produces 
structured outputs is an object that when contracted with an in- 
cremental input structure produces a linear approximation to the 
incremental output. Thus, if we define function g by 


g(z,y) = (y) (y—2)*,e"*), (7.30) 


then the derivative of g is: 


2(x +y) 2(x +y) 
Dg(z, y) = | (-%0 -— az) ; [% — | | (7.31) 


erty 
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In Scheme: 


(define (g x y) 
(up (square (+ x y)) (cube (- y x)) (exp (+ x y)))) 


(print-expression ((D g) ’x ’y)) 
(down (up (+ (* 2 x) (* 2 y)) 
(+ (* -3 (expt x 2)) (* 6 x y) (* -3 (expt y 2))) 
(* (exp y) (exp x))) 
(up (+ (* 2 x) (* 2 y)) 
(+-(* 3 (expt & 2)) {* =6- 8 y) (* 3 (expt y.2))) 
(* (exp y) (exp x)))) 


Exercise 7.1: Chain rule 


Let F(x,y) = x°y’, G(z,y) = (F(x,y), y), and H(a,y) = F(F(a,y),y), 
so that H = FoG. 


a. Compute oF (x,y), and 0, F(x,y). 

b. Compute OF (F(x,y), y), and 0, F (F(a, y), y). 

. Compute G(x, y), and 0,G(a, y). 

d. Compute DF(a,b), DG(3,5), and DH(3a?, 5b?). 


a 


( 
( 
Exercise 7.2: Computing derivatives 


We can represent functions of multiple arguments as procedures in sev- 
eral ways, depending upon how we wish to use them. The simplest idea 
is to identify the procedure arguments with the function’s arguments. 

For example, we could write implementations of the functions that 
occur in exercise 7.1 as follows: 


(define (f x y) 
(* (square x) (cube y))) 


(define (g x y) 
(up (£ x y) y)) 


(define (h x y) 
(£ (£ x y) y)) 


With this choice it is awkward to compose a function with multiple 
arguments, such as f, with a function that produces a tuple of those 
arguments, such as g. Alternatively, we can represent the function ar- 
guments as slots of a tuple data structure, and then composition with 
a function that produces such a data structure is easy. However, this 
choice requires the procedures to build and take apart structures. 

For example, we may define procedures that implement the functions 
above as follows: 
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(define (f v) 
(let ((x (ref v 0)) 
(y (ref v 1))) 
(* (square x) (cube y)))) 


(define (g v) 
(let ((x (ref v 0)) 
(y (ref v 1))) 
(up (£ v) y))) 


(define h (compose f g)) 


Repeat exercise 7.1 using the computer. Explore both implementa- 
tions of multiple-argument functions. 


8 


Appendix: Scheme 


Programming languages should be designed not by 
piling feature on top of feature, but by removing 
the weaknesses and restrictions that make 
additional features appear necessary. Scheme 
demonstrates that a very small number of rules for 
forming expressions, with no restrictions on how 
they are composed, suffice to form a practical and 
efficient programming language that is flexible 
enough to support most of the major programming 
paradigms in use today. 


Revised? Report on the Algorithmic Language 
Scheme, (1986). 


Here we give an elementary introduction to Scheme.! For a more 
precise explanation of the language see the IEEE standard [21]. 
For a longer introduction see the textbook [1]. 

Scheme is a simple programming language based on expressions. 
An expression names a value. For example, the numeral 3.14 
names an approximation to a familiar number. There are primitive 
expressions, such as a numeral, that we directly recognize, and 
there are compound expressions of several kinds. 


Procedure calls 

A procedure call is a kind of compound expression. A procedure 
call is a sequence of expressions delimited by parentheses. The 
first subexpression in a procedure call is taken to name a proce- 
dure, and the rest of the subexpressions are taken to name the 
arguments to that procedure. The value produced by the proce- 
dure when applied to the given arguments is the value named by 
the procedure call. For example, 


'Many of the statements here are only valid assuming there are no assignments. 
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(+ 1 2.14) 
3.14 


(+ 1 (* 2 1.07)) 
3.14 


are both compound expressions that name the same number as 
the numeral 3.14.2 In these cases the symbols + and * name 
procedures that add and multiply, respectively. If we replace any 
subexpression of any expression with an expression that names 
the same thing as the original subexpression, the thing named by 
the overall expression remains unchanged. In general, a procedure 
call is written 


( operator operand-1 ... operand-n ) 


where operator names a procedure and operand-i names the ith 
argument.’ 


Lambda expressions 

Just as we use numerals to name numbers, we can use \-expressions 
to name procedures. For example, the procedure that squares its 
input can be written: 


(lambda (x) (* x x)) 


This expression can be read: “The procedure of one argument, 2, 
that multiplies x by x.” Of course, we can use this expression in 
any context where a procedure is needed. For example, 


((lambda (x) (* x x)) 4) 
16 


The general form of a A-expression is: 


(lambda formal-parameters body) 


?In examples we show the value that would be printed by the Scheme system 
using an italic face following the input expression. 


3In Scheme every parenthesis is essential: you cannot add extra parentheses 
or remove any. 


“The logician Alonzo Church [12] invented A notation to allow the specification 
of an anonymous function of a named parameter: Axz[expression in x]. This 
is read “That function of one argument that is obtained by substituting the 
argument for x in the indicated expression.” 
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where formal-parameters is a list of symbols that will be the names 
of the arguments to the procedure and the body is an expression 
that may refer to the formal parameters. The value of a procedure 
call is the value of the body of the procedure with the arguments 
substituted for the formal parameters. 


Definitions 
We can use the define construct to give a name to any object. 
For example, if we make the definitions 


(define pi 3.141592653589793) 
(define square (lambda (x) (* x x))) 


we can then use the symbols pi and square wherever the numeral 
or the A-expression could appear. For example, the area of the 
surface of a sphere of radius 5 meters is: 


(* 4 pi (square 5)) 
314.1592653589793 


Procedure definitions may be expressed more conveniently, using 
“syntactic sugar.” The squaring procedure may be defined 


(define (square x) (* x x)) 


which we may read: “To square x multiply x by x.” 

In Scheme, procedures may be passed as arguments and re- 
turned as values. For example, it is possible to make a procedure 
that implements the mathematical notion of the composition of 
two functions:° 


°The examples are indented to help with readability. Scheme does not care 
about extra whitespace, so we may add as much as we please to make things 
easier to read. 
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(define compose 
(lambda (f g) 
(lambda (x) 

(£ (g x))))) 


((compose square sin) 2) 
-826821810431806 


(square (sin 2)) 
-826821810431806 


Using the syntactic sugar shown above we can write the defini- 
tion more conveniently. The following are both equivalent to the 
definition above: 


(define (compose f g) 
(lambda (x) 
(£ (g x)))) 


(define ((compose f g) x) 
(f (g x))) 


Conditionals 

Conditional expressions may be used to choose among several ex- 
pressions to produce a value. For example, a procedure that im- 
plements the absolute value function may be written: 


(define (abs x) 
(cond ((< x 0) (- x)) 
((= x 0) x) 
((> x 0) x))) 


The conditional cond takes a number of clauses. Each clause has 
a predicate expression, which may be either true or false, and a 
consequent expression. The value of the cond expression is the 
value of the consequent expression of the first clause for which the 
corresponding predicate expression is true. The general form of a 
conditional expression is 


(cond ( predicate-1 consequent-1) 
( predicate-n consequent-n)) 


For convenience there is a special predicate expression else that 
can be used as the predicate in the last clause of a cond. The if 
construct provides another way to make a conditional when there 
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is only a binary choice to be made. For example, because we only 
have to do something special when the argument is negative we 
could have defined abs as: 


(define (abs x) 
(if (< x 0) 
(- x) 
x)) 


The general form of an if expression is 


(if predicate consequent alternative) 


If the predicate is true the value of the if expression is the value 
of the consequent, otherwise it is the value of the alternative. 


Recursive procedures 

Given conditionals and definitions we can write recursive proce- 
dures. For example, to compute the nth factorial number we may 
write: 


(define (factorial n) 
(if (= n 0) 
1 
(* n (factorial (- n 1))))) 


(factorial 6) 
720 


(factorial 40) 
815915283247897734345611269596115894272000000000 


Local names 
The let expression is used to give names to objects in a local 
context. For example, 


(define (f radius) 
(let ((area (* 4 pi (square radius))) 
(volume (* 4/3 pi (cube radius)))) 
(/ volume area))) 


(£ 3) 
1 


The general form of a let expression is 
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(let (( variable-1 expression-1) 


( variable-n expression-n) ) 


body) 


The value of the let expression is the value of the body expression 
in the context where the variables variable-i have the values of 
the expressions expression-i. The expressions expression-i may 
not refer to the variables variable-i. 

A slight variant of the let expression provides a convenient 
way to express looping constructs. We can write a procedure that 
implements an alternative algorithm for computing factorials as 
follows: 


(define (factorial n) 
(let clp ((count 1) (answer 1)) 
(if (> count n) 
answer 
(clp (+ count 1) (* count answer))))) 


(factorial 6) 
720 


Here, the symbol following the let (in this case clp) is locally de- 
fined to be a procedure that has the variables count and answer 
as its formal parameters. It is called the first time with the ex- 
pressions 1 and 1, initializing the loop. Whenever the procedure 
named clp is called later, these variables get new values, which are 
the values of the operand expressions (+ count 1) and (* count 
answer). 


Compound data—lists and vectors 

Data can be glued together to form compound data structures. 
A list is a data structure in which the elements are linked se- 
quentially. A Scheme vector is a data structure in which the el- 
ements are packed in a linear array. New elements can be added 
to lists, but a list takes computing time proportional to its length 
to access. Scheme vectors can be accessed in constant time, but 
a Scheme vector is of fixed length. All data structures in this 
book are implemented as combinations of lists and Scheme vec- 
tors. Compound data objects are constructed from components by 
procedures called constructors and the components are accessed 
by selectors. 
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The procedure list is the constructor for lists. The selector 
list-ref gets an element of the list. All selectors in Scheme are 
zero-based. For example, 


(define a-list (list 6 946 8 356 12 620)) 


a-list 
(6 946 8 356 12 620) 


(list-ref a-list 3) 
356 


(list-ref a-list 0) 
6 


Lists are built from pairs. A pair is made using the constructor 
cons. The selectors for the two components of the pair are car 
and cdr. A list is a chain of pairs, such that the car of each pair 
is the list element and the cdr of each pair is the next pair, except 
for the last cdr, which is a distinguishable value called the empty 
list and which is written (). Thus, 


(car a-list) 
6 


(cdr a-list) 
(946 8 356 12 620) 


(car (cdr a-list)) 
946 


(define another-list 
(cons 32 (cdr a-list))) 


another-list 
(32 946 8 356 12 620) 


(car (cdr another-list) ) 
946 


Both a-list and another-list share the same tail (their cdr). 


®These names are accidents of history. They stand for “the Contents of the Ad- 
dress Register” and “the Contents of the Decrement Register” of the IBM 704 
computer, which was used for the first implementation of Lisp in the late 
1950's. 
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There is a predicate pair? that is true of pairs and false on all 
other types of data. 

Vectors are simpler than lists. There is a constructor vector 
that can be used to make vectors, and there is a selector vector-ref 
for accessing the elements of a vector: 


(define a-vector 
(vector 37 63 49 21 88 56)) 


a-vector 
#(37 63 49 21 88 56) 


(vector-ref a-vector 3) 
21 


(vector-ref a-vector 0) 
37 


Notice that a vector is distinguished from a list on printout by the 
character “4#” appearing before the initial parenthesis. 

There is a predicate vector? that is true of vectors and false 
on all other types of data. 

The elements of lists and vectors may be any kind of data, in- 
cluding numbers, procedures, lists, and vectors. There are numer- 
ous other procedures for manipulating list-structured data and 
vector-structured data that can be found in the Scheme online 
documentation. 


Symbols 
Symbols are a very important kind of primitive data type that we 
use to make programs and algebraic expressions. You probably 
have noticed that Scheme programs look just like lists. They are 
lists. Some of the elements of the lists that make up programs 
are symbols, such as + and vector. If we are to make programs 
that can manipulate programs we need to be able to write an 
expression that names such a symbol. This is accomplished by 
the mechanism of quotation. The name of the symbol + is the 
expression ’+, and in general the name of an expression is the 
expression preceded by a single quote character. Thus the name 
of the expression (+ 3 a) is ? (+ 3 a). 

We can test if two symbols are the identical with the predi- 
cate eq?. Using this we can write a program to determine if an 
expression is a sum: 
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(define (sum? expression) 
(and (pair? expression) 
(eq? (car expression) ’+))) 


(sum? ’?(+ 3 a)) 
#t 


(sum? ’?(* 3 a)) 
#f 


Consider what would happen if we were to leave out the quote in 
the expression (sum? ’(+ 3 a)). If the variable a had the value 4 
we would be asking if 7 is a sum. But what we wanted to know 
was whether the expression (+ 3 a) is a sum. That is why we 
need the quote. 
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